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GNTin EXPRESSION IN PLANTS 



FIELD OF THE INVENTION 

The invention relates to expression of a mammalian N-acetylglucosaminyl-transferase III 
(GnTHI). enzyme in plants and its use in producing glycoproteins with bisected oligosaccharides and 
5 increased amount of terminal GlcNAc residues. The invention further relates to a hybrid protein 
comprising the catalytic site of GnTHI and transmembrane domain of Golgi apparatus and/or 
endoplasmic reticulum (ER) protein or modified GNTIII comprising ER retention signals and its use 
in producing glycoproteins with oligosacchararides that lack unmunogenic xylose and fucose 
residues. 

10 

BACKGROUND OF THE INVENTION 

N-Acetylglucosaminyltransferases (GIcNAc-transferases) are "branching" enzymes that add 
an Nacetylglucosamine (GlcNAc) residue to one of tiie mannoses of the trimannosyl core structure 
of typical Nlinked glycans. At least six GlcNAc-transferases are known with little or no sequence 

1 5 homology. Besides different protem structures, these Gl cNActransferases also have different 

enzymatic properties and substrate specificity. All are typical type II transmembrane proteins with a 
cytoplasmic domain, a transmembrane anchor and an extracellular stem region with catalytic domain. 

A remarkable GlcNAc-transferase is GlcNAc-transferase III (GnTIII). GnTIII, also known 
as UDP-Nacetylglucosamine:p-D-mannoside P(l,4)-N-acetylglucosaminyl-transferase III (EC 

20 2.4.1 .144), inserts bisecting GlcNAc residues in complex-type N-linked glycans of cellular 

glycoproteins (for a review see Taniguchi, et al, "A glycomic approach to the identification and 
characterization of glycoprotein function in cells transfected with glycosyltransferase genes" 
Proteomics 1 :239247, 2001). GnTHI adds the GlcNAc through a P(l,4) linkage to tiie p-linked 
mannose of the trimannosyl core structure of the N-linked glycan, GnTIII was first identified in hen 

25 oviduct (Narasimhan S., "Control of glycoprotein synthesis. UDP-GlcNAc:glycopeptide P 4- 

Nacetylglucosaminyltransferase ID, an enzyme in hen oviduct which adds GlcNAc m pi4 linkage to 
tiie p-linked mannose of the trimannosyl core of N-glycosyl oligosaccharides" The Journal of 
Biological Chemistry 257:10235-10242, 1982) but a high level of activity has also been leported in 
various types of rat hepatomas, human serum, liver and hepatoma tissues of patients with hepatomas 

30 and liver ciirhosis (Ishibashi, et aL, "N-acetylglucosaminyltransferase III in human serum and liver 
and hepatoma tissues: increased activity in liver cirrhos and hepatoma patients" Clinical Chimica 
Acta 185:325, 1989;Narishimhan, etal, "Expression of N-acetylglucosaminyltransferase III in 
hepatic nodules during rat liver carcinogenesis promoted by orotic acid" Journal of Biological 
Chemistry 263:1273-1281, 1988; Nishikawa, et al "Determination of N- 

35 acetylglucosaminyltransferases III, IV and V in normal and hepatoma tissues of rats" Biochimica et 
BiophysicaAcia 1035:313-318, 1990; Pascale, et al, "Expression of N- 
acetylglucosaminyltransferase III in hepatic nodules generated by different models of rat liver 
carcinogenesis" Carcinogenesis 10:961964, 1989). Bisected oligosccharides on glycoproteins have 
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been implicated in antibody-dependent cellular cytotoxicity (ADCC). ADCC is a lytic attack on 
antibody-targeted cells and is triggered upon binding of lymphocyte receptors to the constant region 
(Fc) of antibodies. Controlled expression of GnTm in recombinant Chinese Hamster Ovary (CHO) 
production cell lines that lack GnTIII activity resulted in antibodies with bisected oligosaccharides 

5 with optimized ADCC activity (Davies, et al, "Expression of GnTm in a recombinant anti-CD20 
CHO production cell line: expression of antibodies with altered glycoforms leads to an increase in 
ADCC through higher afifmity for FcyRIII" Biotechnology and Bioengineering 74:288-294, 2001; 
Umana, et al, "Enguieered glycoforms of an antineuroblastoma IgGl with optimized antibody- 
dependent cellular cytotoxic activity" Nature Biotechnology 17:176-180, 1999). The ADCC activity 

1 0 correlated well with the level of Fc region-associated bisected complex oligosaccharides present on 
tihe recombinant antibody (Umana, et aL, "Engineered glycoforms of an antineuroblastoma IgGl 
with optimized antibody-dependent cellular cytotoxic activity" Nature Biotechnology 17:176-180, 
1999). Bisecting GlcNAc residues resulting from GnTIII activity affect the conformation of the 
sugar chains in such a way that other glycosyltransferases such as GIcNAc-transferase U and al,6- 

15 fiicosyltransferase, but not p(l,4)-galactosyltransferase, can no longer act (Tanigichi, etaL, 2001). 
Overexpression of GnTIII in CHO cells is lethal. 

In contrast to typical mammalian production cell lines such as CHO cells, transgenic plants 
are generally recognized as a safe production system for therapeutic proteins. Plant glycoproteins, 
however, dijffer in oligosaccharide structure witii those from mammals in several aspects. They lack 

20 terminal galactose and sialic acid, have an additional core xylose and differently linked core fiicose 
(a-1,3) instead of (a-1,6). Like CHO and other pharmaceutical production cell lines they also 
completely lack bisected oligosaccharides. Plants have the capacity to generate the common core 
structure, GN2M3GN2 but predommantly M3 GN2 variants are found, indicating removal of 
terminal GN by hexosaminidases. 

25 Biogenesis of N-linked glycans begins with the synthesis of a lipid linked oligosaccharide 

moiety (Glc3Man9GlcNAc2-) which is transferred en bloc to the nascent polypeptide chain in the 
endoplasmic reticulum (ER). Through a series of trimming reactions by exoglycosidases in the ER 
and cis-Golgi compartments the so-called "high mannose" (Man9GlcNAc2 to Man5GlcNAc2) 
glycans are formed. Subsequently, the formation of complex type glycans starts with the transfer of 

30 the first G 1 cNAc onto Man5GlCNAC2 by GnTI and further trimming by mannosidase II (Mann) to 
form GlcNAcMan3 GlcNAc2. Complex glycan biosynthesis continues while the glycoprotein is 
progressing through the secretory pathway with the transfer in the Golgi apparatus of the second 
GlcNAc residue by GnTII as well as other monosaccharide residues onto the 
GIcNAcMan3GlcNAc2 under the action of several other glycosyl transferases. Plants and manmials 

35 differ with respect to the formation of complex glycans. In plants, complex glycans are characterized 
by the presence of P(l,2)-xylose residues linked to the Man-3 and/or an a(l,3)-fucose residue linked 
to GlcNAcl, instead of an a(l,6)-fucose residue linked to the GlcNAc-1 (Lerouge, P., etai, "N- 
glycoprotein biosynthesis in plants: recent developments and future trends" Plant Mol Biol 38:3 1 -48, 
1998). Genes encoding the corresponding xylosyl (XylT) and fucosyl (FucT) transferases have been 
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isolated (Strasser R, "Molecular cloning and functional expression of p 1, 2-xylosyltransferase cDNA 
bom Arabidopsis thaliand' FEES Lett. 472:105-8, 2000; Leiter, H., et al, "Purification, cDNA 
cloning, and e)q>ression of GDP-L-Fuc:Asn-linked GlcNAc a 1,3-fucosyltransferase from mung 
beans" J Biol Chem. 274:21830, 1999). Xylose and fucose epitopes are known to be highly 
5 immunogenic and possibly allergenic which may pose a problem when plant are used for the 

production of therapeutic glycoproteins. Moreover, blood serum of many allergy patients contains 
IgE directed against these epitopes which make particularly these patients at risk to treatments with 
xylose and fucose containing recombinant proteins. In addition, this carbohydrate directed Ig!E in 
sera might cause false positive reaction in in vitro tests usmg plant extracts since there is evidence 

10 that these carbohydrate specific IgE's are not relevant for the allergenic reaction. Plants do not 

possess P(l,4)galactosyltransferases nor a(2,6)sialyltransferases and consequently plant glycans lack 
the p(l,4)galactose and terminal a(2,6)NeuAc residues often found on mammalian glycans (Vitale 
and Chrispeels, 'Transient N-acetylglucosamine in the biosynthesis of phytohemagglutmin: 
attachment in the Golgi apparatus and removal in protein bodies'' J Cell Biol 99:133-140, 1984; 

1 5 Lerouge, P., et al , "N-glycoprotein biosynthesis in plants: recent developments and future trends" 
P/awrM>/ 5/0/ 38:31-48, 1998). 

The fmal glycan structures are not only determined by the mere presence of enzymes 
involved in their biosynthesis but to a large extend by the specific sequence of the various enzymatic 
reactions. The latter is controlled by discrete sequestering and relative position of these enzymes 

20 throughout the £R and Golgi, which is mediated by the interaction of determinants of the transferase 
and specific characteristics of the sub-Golgi compartment for which the transferase is destined. A 
number of studies using hybrid, molecules have identified that the transmembrane domains of several 
glycosyltransferases play a central role in their sub-Golgi sorting (Grabenhorst E., e/.fl/., J. Biol 
Chem. 274:36107-36116, 1999; Colley, K., Glycobiology J:\-U, 1997, Munro, S., Trends Cell Biol 

25 8:11-15, 1998; Gleeson P.A., Histochem, Cell Biol 109:517-532, 1998). 

Similar to mammalian production cell Imes used in pharmaceutical industry, glycoproteins 
produced in plants lack GnTIlI activity. Plants not only lack GnTlII activity but are completely 
devoid of GnTIII-Iike sequences. In addition, plants also lack GnTTV, GnTV ands GnTVI sequences 
and moreover, sialic acid residues. (For an overview of the major glycosylation attributes of 

30 commonly used cell expression systems including plants see, Jenkins, et al, "Getting the 

glycosylation right: implications for the biotechnology industry" Nature Biotechnology 14:975-979, 
1996). Nevertheless, plants are a very potent production system. Plants are generally accepted as 
safe and are free of particles infectious to humans. Plant production is easy scalable and N-linked 
glycosylation can be controlled (Bakker, et aly "Galactose-extended glycans of antibodies produced 

35 by transgenic plants" Proc. Nat. Acad Sci, USA 98:2899-2904, 2001). 

Transgenic tobacco plants that produce galactosylated recombinant monoclonal antibodies 
(Mabs) upon introduction of the human gene for P(l,4)-galactosyltransferase have been reported 
(hGalT; Bakker, et al, "Galactose-extended glycans of antibodies produced by transgenic plants" 
Proc. Nat. Acad Set USA 98:2899-2904, 2001; WOOl/31044 and WOOl/31045). 
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Therapeutic glycoproteins can be improved by altering their glycosylation pattern (Davies, et 
al, *TExpression of GnTIII in a recombinant anti-CD20 CHO production cell line: expression of 
antibodies with altered glycoforms leads to an increase in ADCC through higher affinity for FcyRIII" 
Biotechnology and Bioengineering 74:2Si-29^, 2001; Umana, et al, ''Engmeered glycoforms of an 

5 antineuroblastoma IgGl with optimized antibody-dependent cellular cytotoxic activity" Nature 
Biotechnology 17:176-180, 1999; Fukuta, et aL, 'Remodeling of sugar chain structures of human 
interferon-y " Glycobiology 10:421-430, 2000; Misaizu, et aL, ''Role of antennaiy structure of N- 
Imked sugar chains in renal handlmg of recombinant human erythropoietin" Blood 86:4097-4104, 
1995; Sburlati, et aL, "Synthesis of bisected glycoforms of recombinant IFN-p by overexpression of 

1 0 p-l,4-N-acetylglucosaminyl-tranferase III in Chinese Hamster Ovary cells'' Biotechnology Prog 
14:189-192, 1998). Higher oligosaccharide antennarity of EPO, for example, leads to increased in 
vivo activity due to reduced kidney filtration (Misaizu, et al, "Role of antennaiy structure of N- 
linked sugar chains in renal handling of recombinant human erythropoietin" Blood 86:4097-4104, 
1995). Biosynthesis of such superior glycoforms can be achieved with the "standard" glycosylation 

15 machinery of normal production cell lines by two methodologies. The first is by enriching specific 
glycoforms during purification and the second is by introducing mutations in the polypeptide chain. 
The latter makes it possible to shift the glycosylation site within the glycoprotein resulting in 
different glycosylation patterns as the result of differences in accessibility. A complementary route 
is through genetic engineering of the production cell line itself New glycosylation patterns can be 

20 obtained through expression of glycosyltransferase and glycosidase genes in production cell lines. 
These genes code for enzymes that either add or remove specific saccharides to and fi-om the glycan 
of cellular glycoproteins. Several glycosyltransferase genes have been introduced in CHO cells to 
manipulate glycoform biosynthesis. One of them is GnTIII. Glycosyltransferase GnTIII is involved 
in branching of the N-linked glycan and results m bisecting GlcNAc residues. CHO cells and other 

25 production cell lines typically lack GnTIII activity (Stanley, P. and C.A. Campbell, "A dominant 
mutation to ricin resistance in Chinese hamster ovary cells induces UDP-GIcNAc: glycopeptide p-4- 
N-acetylglucosaminyl-transferase HI activity" Journal of Biological Chemistry 261 :13370-13378, 
1984). Expression of GnTIII in CHO resulted in bisected complex oligosaccharides as expected but 
overexpression resulted in growth inhibition and was toxic to cells. Similarly, overexpression of 

30 GnTV, another glycosyltransferase that introduces triantennary sugar chains, also resulted in growth 
inhibition suggesting that this may be a general feattire of glycosyltransferase overexpression 
(Umana, et aL, "Engineered glycoforms of an antineuroblastoma IgGl with optunized antibody- 
dependent cellular cytotoxic activity" Native Biotechnology 17:176-180, 1999). 

Therefore, there is a need to provide a means for producing glycoprotein in plants with 

35 human compatible non-inmiunogenic bisecting oligosaccharides. 



SUMMARY OF THE INVENTION 

The invention relates to expression of a manunalian N-acetylglucosaminyl-transferase III 
(GnTIII) enzyme in plants and its use in producing glycoproteins with bisected oligosaccharides and 



wo 03/078614 



PCT/IB03/01562 



-5- 

increased amount of terminal GlcNAc residues. The invention further relates to a hybrid protein 
comprising the catalytic site of GnTIII and transmembrane domain of Golgi apparatus and/or 
endoplasmic reticulum (ER) protein or modified GNTHI comprising ER retention signals and its use 
in producing glycoproteins with oligosacchararides that lack immunogenic xylose and fiicose 
S residues. 

In one embodiment, the present invention contemplates a plant host system comprising or 
expressing a mammalian UDP-Nacetylglucosamine:(p-D-mannoside P(l,4)- 
Nacetylglucosaminyltransferase (GnTIII) enqme (nucleotide sequence: SEQ ID NO.: 1, Genbank 
LD. number AL022312 (Dunham, L, et al.Nature 402:489-495, 1999); protein sequence: SEQ ID 
10 NO.: 2, Genbank LD. number Q09327), wherein said GnTIII inserts bisecting Nacetyl glucosamine 
(GlcNAc) residues in complex-type N-linked glycans of a glycoprotein present in said plant host 
system 

In a specific embodiment of the invention, the plant host system further comprises a 
heterologous glycoprotein or functional fragment thereof comprising bisected oligosaccharide, 

1 5 particularly galactose residues. The GnTIII inserts bisecting N-Gl cNAc residues onto said 
heterologous glycoprotein. 

In one embodiment, the present invention contemplates to a method for obtaining a plant host 
system expressing a heterologous glycoprotein comprising bisecting oligosaccharides. In one 
embodiment, the method comprises crossing a plant expressing a heterologous glycoprotein with a 

20 plant expressing said GnTIII, harvesting progeny from said crossing and selecting a desired progeny 
plant expressing said heterologous glycoprotein and expressing mammalian GnTIII. Alternatively, 
said plant host system may be obtained by introducing into a plant or portion thereof a nucleic acid 
encoding said mammalian GnTIII and a nucleic acid encoding said heterologous glycoprotein and 
isolating a plant or portion thereof expressing said heterologous glycoprotein and expressing 

25 mammalian GnTIII that is normally not present in plants. Furthermore, the invention is directed to a 
method for obtaining said heterologous glycoprotein from said plant comprising obtaining a plant 
host system using either of the procedures described above and further isolating said heterologous 
glycoprotein. 

In another embodiment, it is contemplated that the plant host system of the present invention 
30 further comprises a functional mammalian enzyme providing N-glycan biosynthesis that is normally 
not present in plants thereby, for example, providing the capacity to extend an N-linked glycan by the 
addition of a galactose as described in WO 01 /21045 (herein incorporated by reference). In another 
embodiment, the present invention further contemplates a plant host system, wherein said plant host 
system comprises crossing a plant, said plant comprising a functional protein such as a transporter 
35 protein or a enzyme (e,g., a mammalian protein) or functional fragment thereof wherem said protem 
provides N-glycan biosynthesis, with a plant comprising said mammalian GnTIII. In another 
embodiment, the present invention contemplates harvesting the progeny from said crossing and 
selecting a desired progeny plant expressing said functional protein such as, for example, a 
transporter protein or en2yme or functional fragment thereof. In yet another embodiment of the 
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present mvention, it is contemplated that the expressed protein provides N-glycan biosynthesis and 
the mammalian GnTHL In still yet another embodiment, the present invention contemplates a plant 
host system, wherein a nucleic acid encoding the GnTffl and a nucleic acid encoding a functional 
protein (for example, a transporter or an enzyme [e.g., mammalian] or functional fragment thereof) 
providing N-glycan biosynthesis and isolating said plant or portion thereof expressing the fimctional 
protein or functional fragment thereof providing N-glycan biosynthesis and said mammalian GnTffl. 
Although the present invention is not limited to any particular theory or mechanism, it is believed 
that such a combination increases galactosylation of a heterologous glycoprotein* Additionally, in 
one embodiment, it is contemplated that GnXni and other proteins providing N-glycosylation such as 
GalT can also be introduced sunultaneously via one transformation vector. 

In one embodiment, the present mvention contemplates a plant host system comprising 
expressing said heterologous glycoprotein (wherein, said heterologous glycoprotein has increased 
galactosylation) and methods for obtaining said plant host cell system and said heterologous 
glycoprotein. In another embodiment, the plant host cell system may be obtained by either crossing a 
plant wherein the plant comprises mammalian GnTffl and a functional protein (for example, a 
transporter or an enzyme [e.g., manmialian] or fimctional fragment thereof that provides N-glycan 
biosynthesis not normally found in plants) with a plant comprising a heterologous glycoprotein and, 
then, selecting said progeny plants. In yet another embodiment, it is contemplated tiiat said 
heterologous glycoprotein may be obtained by introducmg nucleic acid sequences encodmg 1) said 
GnTffl, 2) said functional protein or enzyme providing N-glycan biosysnthesis not normally found in 
plants and 3) said heterologous glycoprotem into said plant or portion thereof and isolating said plant 
or portion thereof expressing said nucleic acid sequences. In another embodiement of tiie present 
mvention, it is contamplated that the heterologous glycoproteins will be isolated or purified from tiie 
plant host systems. 

In one embodiment of the present invention, a hybrid protein is contemplated, wherein the 
hybrid protein comprises 1) an isolated hybrid protein comprising a catalytic portion of mammalian 
GnTIII and 2) a transmembrane portion of a protein from, for example, tiie endoplamsic reticulum or 
Golgi apparatus of a eukaryotic cell. In anotiier embodiment, tiie present invention also contemplates 
a modified mammalian GnTffl comprising a retention signal such as KDEL for retention of said 
GnTffl in tiie ER. In yet anotiier embodiment, tiie present invention contemplates nucleic acid 
sequences encoding 1) said hybrid proteins and said modified mammalian GnTffl, 2) vectors 
comprising said nucleic acid sequences and 3) plant host systems comprising said sequences. In one 
embodiment, these hybrid proteins and modified GnTIIIs may act to relocalize GnTIII activity in tiie 
endoplasmic reticulum (ER) and/or Golgi apparatus. In anotiier embodiment, tiie present invention 
contemplates methods for obtaining these hybrid proteins and modified GnTffl proteins by, for 
example, introducing sequences encoding said hybrid proteins or modified GnTIIIs into a plant or 
portion tiiereof Altiiough the present invention is not limited to any particular tiieory or mechanism, 
it is believed that as a result of such relocalization, bisecting GlcNAc is be mtroduced earlier in tfie 
N-glycan biosyntiiesis sequence of reactions thereby preventing subsequent enzymatic reactions and. 
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as a consequence, a heterologous protein expressed in a plant host system (for example, the plant 
host system of the present invention) will lack xylose and facose and have increased amount of 
terminal GlcNAc. Accordingly, one embodiment of the present invention contemplates a method to 
provide a plant host system expressing a heterologous glycoprotein (said plant host system having 

5 the capacity to extend an N-linked glycan with galactose) comprising crossing a plant comprising 
said 1) hybrid protein or said modified GnTin with a plant comprising said heterologous protein and 
2) selecting said desired progeny. In another embodiment, the present invention contemplates 
introducing into a plant or portion tiiereof a nucleic acid sequence encoding 1) said modified GnTIII 
or said hybrid protein and said heterologous glycoprotein and 2) isolating said plant or portion 

10 thereof expressing a heterologous glycoprotein with the capacity to extend and N-linked glycan with 
galactose. In yet another embodiment, the present invention contemplates a method for obtaining 
said desu«d heterologous glycoprotein, said method comprising isolating said glycoprotem from said 
plant or portion thereof. 

In one embodiment, the present invention contemplates that the plant-derived glycoprotein or 

1 5 functional fragment thereof may be used for the production of a pharmaceutical composition (for 
example, an antibody, a hormone, a vaccine antigen, an enzyme, or the like). In another 
embodiment, the present invention contemplates a pharmaceutical composition comprising a 
glycoprotein or fimctional fragment thereof is now also provided. 

In one embodiment, the present invention contemplates variants or mutants of Gntin. The 

20 terms "variant" and "mutant" when used in reference to a polypeptide refer to an amino acid 
sequence that differs by one or more amino acids from another, usually related, polypeptide. In 
another embodiment, the present invention contemplates variants that have "conservative" changes, 
v^erein a substituted amino acid has similar structural or chemical properties. One type of 
conservative amino acid substitutions refers to the interchangeability of residues having similar side 

25 chains. For example, a group of amino acids having aliphatic side chains is glycine, alanine, valine, 
leucine, and isoleucine; a group of amino acids having aliphatic-hydroxyl side chains is serine and 
threonine; a group of ammo acids having amide-containing side chains is asparagine and glutamine; 
a group of amino acids having aromatic side chains is phenylalanine, tyrosine, and tryptophan; a 
group of amino acids having basic side chains is lysine, arginine, and histidine; and a group of amino 

30 acids having sulfiir-containing side chains is cysteine and methionine. Preferred conservative amino 
acids substitution groups are: valine (V) -leucine (L) -isoleucine (I), phenylalanine (F) -tyrosine (Y), 
lysine (K) -arginine (R), alanine (A) -valine (V), and asparagine (N) -glutamine (Q). 

In yet another embodiment, the present invention contemplates variants that have "non- 
conservative" changes {e.g., replacement of a glycine with a tryptophan). Similar minor variations 

35 may also include amino acid deletions or insertions (/.e., additions), or both. Guidance in 

determining which and how many amino acid residues may be substituted, inserted or deleted 
without abolishing biological activity may be found using computer programs well known in the art, 
for example, DNAStar software. Variants can be tested in fimctional assays. For both conservative 
and non-conservative variants, preferred variants have less than 10 %, preferably less than 5 % and. 
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still more preferably, less than 2 % changes (whether substitutions, deletions, and so on). 

In one embodiment, the present invention contemplates a plant host (cell) system, comprising 
a mammalian UDP-N-acetylglucosamine: p-D mannoside p(l,4)-N-acetylglucosaminyltransferase 
(GnXni) enzyme (or portion or variant thereof, wherein said GnTIII inserts bisecting N-acetyl 

5 glucosamine (GlcNAc) residues in complex-type N-linked glycans of a glycoprotein present in said 
plant host system). In another embodiment, the present invention contemplates the plant host, 
wherein said GnTIII is a human GnTIII. In yet another embodiment, the present invention 
contemplates the plant host system, wherein said system is a portion of a plant. In yet another 
embodiment, the present invention contemplates the plant host system, wherein said system is a 

10 portion of a plant selected from the group consisting of a cell, leaf, embryo, callus, stem, pericarp, 
protoplast, root, tuber, kernel, endosperm and embryo. In yet another embodiment, the present 
invention contemplates the plant host system, wherein said system is a whole plant. In yet another 
embodiment, the present invention contemplates the plant host system, further comprising a 
heterologous glycoprotein (or functional fragment thereof). In yet another embodiment, the present 

15 invention contemplates tiie plant host system, wherein said heterologous glycoprotein protein 

comprises an antibody, or fragment (e,g. Fc, Fv, Fab, Fab2) thereof. In yet another embodiment, tiie 
present invention contemplates tiie plant host system, wherein said heterologous glycoprotein or 
functional fragment thereof comprises bisected oligosaccharides. In yet another embodiment, the 
present invention contemplates the plant host system, wherein said heterologous glycoprotein (or 

20 functional fragment thereof) comprises bisected glycans with galactose residues. In yet another 
embodiment, the present invention contemplates the plant host system, wherein said plant is a 
tobacco plant. In yet another embodiment, the present invention contemplates the plant host system, 
which further comprises a functional protein selected from a group consisting of a transporter or a 
(mammalian) enzyme (or functional fragment thereof) providing N-glycan biosynthesis. In yet 

25 another embodiment, the present invention contemplates tiie plant host system, wherein said enzyme 
is a (human) p-1,4 galactosyltransferase. In yet another embodiment, the present invention 
contemplates the plant host system, which further comprises a heterologous glycoprotein, having an 
increased number of galactose residues. In yet another embodiment, the present invention 
contemplates a plant host system comprising a nucleic acid sequence encoding a mammalian GnTIII 

30 protein. In yet another embodiment, the present invention contemplates a plant host system 

comprising a vector comprising a nucleic acid sequence encoding a mammalian GnTIII protein. In 
yet another embodiment, the present invention contemplates the plant host, which further comprises 
a nucleic acid sequence encoding a functional protein selected from a group consisting of a 
transporter or a (mammalian) enzyme (or fimctional fragment thereof) providing N-glycan 

35 biosynthesis. 

In one embodiment, the present invention contemplates a method (for obtaining a plant host 
system expressing a heterologous glycoprotein having bisected oligosaccharides) comprising a) 
crossing a plant expressing a heterologous glycoprotein with a, b) harvesting progeny from said 
crossing and c) selecting a desired progeny plant (expressing said heterologous glycoprotein and 
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expressing a mammalian Gnim that is nomally not present in plants). In another embodiment, the 
present invention contemplates this method, wherein said desired progeny plant expresses said 
heterologous glycoprotein protein having bisected oligosaccharides. In yet another embodiment, the 
present invention contemplates this method, wherein said plant host system is a transgenic plant. 
5 In one embodiment, the present invention contemplates a method for obtaining a 

heterologous glycoprotein having bisected oligosaccharides comprising a) introducing a nucleic acid 
sequence encoding GnTUI that is normally not present in plant mto a plant host system and a nucleic 
acid sequence encoding a heterologous glycoprotem and b) isolating said heterologous glycoprotein. 
In another embodiment, the present invention contemplates this method, wherein said nucleic acid 
10 sequences are introduced into a plant cell and said plant cell is regenerated into a plant In yet another 
embodiment, the present mvention contemplates the same method, wherein said nucleic acid 
sequences are introduced into a plant host system by transforming said plant host system with a 
vector comprising a acid sequence encoding GnTin that is normally not present in plant into a plant 
and a nucleic acid sequence encoding a heterologous glycopraxein. In yet another embodiment, the 
15 present invention contemplates the method, wherein said nucleic acid sequences are introduced into a 
plant host system by transforming said plant host system with a vector comprising a nucleic acid 
sequence encoding GnUII that is normally not present in plant into a plant and a nucleic acid 
sequence encoding a heterologous glycoprotein. In yet another embodiment, the present invention 
contemplates the method, wherein said nucleic acid sequences are mtroduced mto a plant host system 
20 by transforming said plant with a vector comprising a nucleic acid sequence encoding GnTIII that is 
normally not present in plant into a plant host system and vector comprising a nucleic acid sequence 
encoding a heterologous glycoprotein. In yet another embodiment, the present invention 
contemplates a method for obtaining a heterologous glycoprotein having bisected oligosaccharides 
comprising cultivating the regenerated plant. 
25 In one embodiment, the present invention contemplates a method for obtaining a desired 

glycoprotein (or functional fragment thereof) comprising a) cultivating the plant host system (until 
said plant has reached a harvestable stage) and b) harvesting said plant (and fractionating to obtain 
fractionated plant material and c) at least partly isolating said glycoprotein fix)m said fractionated 
plant material). In another embodunent, the present invention contemplates a plant obtainable by the 
30 contemplated method. 

In one embodiment, the present invention contemplates A method for obtaining a plant host 
system comprising a functional protein selected from a group consisting of a transporter or a 
(mammalian) enzyme or functional fragment thereof providing N-glycan biosynthesis and a 
mammalian GnTIII comprising crossing a plant comprising a functional protein such as a transporter 
35 or a (mammalian) enzyme or functional fragment thereof providing N-glycan biosynthesis with a 
plant according to Claim 5, harvesting progeny from said crossing and selecting a desired progeny 
plant expressing said functional protein such as a transporter or a (mammalian) enzyme or functional 
fragment thereof providing N-glycan biosynthesis and said mammalian Gnllll. In another 
embodiment, the present invention contemplates a transgenic plant obtained according to the 
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contemplated. 

In one embodiment, the present invention contemplates a method for increasing 
galactosylation of a heterologous glycoprotein expressed in a plant host system comprising 
introducing a nucleic acid sequence encoding GnTIII and a sequence selected from a group 
5 consisting sequences that encode a transporter or a (mammalian) enzyme or functional fragment not 
normally present in a plant into said plant host system expressing said heterologous glycoprotein and 
isolating said glycoprotein. 

In one embodiment, the present invention contemplates a plant derived glycoprotem 
comprising bisected oligosaccharides. 
10 In one embodiment, the present invention contemplates the use of a plant host system 

contemplated by the present mvention to produce a desired glycoprotein or fimctional fragment 
thereof. In another embodiment, the present invention contemplates that said glycoprotein or 
functional fragment thereof comprises bisected oligosaccharides, h yet another embodiment, the 
present invention contemplates a plant-derived glycoprotein or functional fragment thereof obtained 
15 by a method contemplated by the present invention. In yet another embodiment, the present invention 
contemplates a glycoprotein or functional fragment thereof contemplated by the invention for the 
production of a pharmaceutical composition. In yet another embodiment, the present invention 
contemplates a composition comprising a glycoprotein or functional fragment thereof as 
contemplated by the present invention. 
20 In one embodiment, the present invention contemplates an isolated hybrid protein comprismg 

an active site of GnTIII and a transmembrane region of a protein, said protein residing in 
endoplasmic reticulum or Golgi apparatus of a eukaryotic cell. In another embodiment, the present 
invention contemplates the protein of the present invention, wherein said protein residing in 
endoplasmic reticulum or Golgi apparatus of a eukaryotic cell is an enzyme. In yet another 
25 embodiment, the present invention contemplates the protein accordof the present invention, wherein 
said protein residing in endoplasmic reticulum or Golgi apparatus of a eukaryotic cell is a 
glycosyltransferase. In yet another embodiment, the present invention contemplates the protein of the 
present invention, wherein said protein residing m endoplasmic reticulum or Golgi apparatus of a 
eukaryotic cell is a glycosyltransferase selected from the group consisting of a mannosidasel, 
30 mannosidasell, GnTI, GnUI, XylT and FucT. In yet another embodiment, the present invention 

contemplates the protein accorof the present invention, wherein said protein residing in endoplasmic 
reticulum or Golgi apparatus of a eukaryotic cell is a plant protein. In yet another embodiment, the 
present invention contemplates an isolated nucleic acid sequence encoding the protein of the present 
invention. In yet another embodiment, the present invention contemplates a vector comprising the 
35 isolated nucleic acid sequence of the present invention. In yet another embodiment, the present 
invention contemplates a plant comprising the isolated nucleic acid sequence of the present 
invention. In yet another embodiment, the present invention contemplates the plant(s) of the present 
invention which further comprises a nucleic acid sequence encoding a heterologous glycoprotein. 

In one embodiment, the present mvention contemplates a method (for providing a transgenic 
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plant capable of expressing a heterologous glycoprotein with the capacity to extend an N-linked 
glycan with galactose) comprising a) crossing a transgenic plant with a plant of the present invention, 
b) harvesting progeny from said crossing and c) selecting a desired progeny plant (expressing said 
recombinant protein and expressing a functional (mammalian) enzyme involved in (mammalian) N- 
glycan biosynthesis that is normally not present in plants): 

In one embodiment, the present invention contemplates a method for providing a transgenic 
plant capable of expressing a heterologous glycoprotein wife the capacity to extend an N-linked 
glycan with galactose comprising introducing the nucleic acid sequence of the present invention and 
a nucleic acid sequence encoding said heterologous glycoprotein. 

In one embodiment, the present invention contemplates a method, comprismg: a) providing: 
i) a plant cell, and ii) an expression vector comprising nucleic acid encoding a GNTUI enzyme; and 
b) introducing said expression vector into said plant cell under conditions such that said enzyme is 
expressed. In another embodiment, the present invention contemplates the method, wherein said 
nucleic acid encoding a GNTIII comprises the nucleic acid sequence of SEQ ID N0:1. 

In one embodiment, the present invention contemplates a method, comprising: a) providing: 
i) a plant cell, ii) a first expression vector comprising nucleic acid encoding a GNTIII enzyme, and 
iii) a second expression vector comprising nucleic acid encoding a heterologous glycoprotein; and b) 
introducing said first and second expression vectors into said plant cell under conditions such tfiat 
said hybrid enzyme and said heterologous protein are expressed. In another embodiment, the present 
invention contemplates the method, wherein said heterologous protein is an antibody or antibody 
fragment. 

In one embodiment, the present invention contemplates A method, comprising: a) providing: 
i) a first plant comprising a first expression vector, said first vector comprising nucleic acid 
encoding a GNTm enzyme, and ii) a second plant comprising a second expression vector, said 
second vector comprising nucleic acid encoding a heterologous protein; and b) crossing said first 
plant and said second plant to produce progeny expressing said hybrid enzyme and said heterologous 
protein. 

In one embodiment, the present invention contemplates a plant, comprising first and second 
expression vectors, said first vector comprising nucleic acid encoding a GNTIII enzyme, said second 
vector comprising nucleic acid encoding a heterologous protem. In another embodiment, the present 
invention contemplates the, wherein said heterologous protein is an antibody or antibody fragment. 

BRIEF DESCRIPTION OF THE FIGURES 

Figures 1 A and IB show MALDI-TOF mass spectra of (A) N-linked glycans isolated from 
leaves of control tobacco plant and (B) N-linked glycans isolated from leaves of selected GnTM-17 
tobacco plant transformed with human GnTIIL See, Table 1 for structures. 

Figure 2 shows processing of high mannose type glycan (M9) to complex type 
glycans under the subsequent action of ManI, GnTI, Manll, and GnTU. It is also indicated what 
glycan structures the action of GalT and/or GnTDU at different points 
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in chain of reactions would lead. The reactions catalyzed by flicosyltransferases and 
xylosyltransferases are not indicated. Core GlcNAc (Gn) is not indicated. On = 
GlcNAc, Gn^ = bisecting GlcNAc, G = galactose and M = mannose. 

Figures 3A and 3B show (A) the T-DNA construct carrying the genes encoding glycan 
5 modifying enzymes to produce ejBBciently galactosylated bisected glycans that are devoid of 
immunogenic xylose and focose and (B) the T-DNA construct carrying antibody light chain and 
heavy chain genes. TmXyl = transmembrane domain of xylosyltransferase, TmGnTI = 
transmembrane domain of QnT, P = promoter, R = selection marker, L = antibody light chain and H 
= antibody heavy chain. 

10 Figures 4A and 4B show the nucleotide sequence (SEQ ID NO: 1, underlined portion of 

Figure 4(A) the protem sequence (SEQ ID NO: 2, underlmed portion of Figure 4B) of GnTIII 
including a c-myc tag. Residues that can undergo conservative amino acid substitutions are defined 
in the DEFINITIONS section. 

Figure 5A and 5B show a (A) map of the plasmid pDAB4005 and (B) the nucleotide 
15 sequence of the plasmid pDAB4005 (SEQ ID NO: 8). 

Figure 6A and 6B show a (A) map of the plasmid pDAB71 19 and (B) the nucleotide 
sequence of the plasmid pDAB71 19 (SEQ ID NO: 9) including splice sites. 

Figure 7A and 7B show a (A) map of the plasmid pDAB8504 and (B) the nucleotide 
sequence of the plasmid pDAB8504 (SEQ ID NO: 10). 
20 Figure 8A and 8B show a (A) map of the plasmid pDAB71 1 3 and (B) the nucleotide 

sequence of the plasmid pDAB71 13 (SEQ ID NO: 1 1) including splice sites. 

Figures 9A and 9B show MALDI-TOF mass spectra of glycoproteins from control and 
GnTIII com. Comparison of mass spectra of N-glycans of glycoproteins isolated from calli of (A) 
control com and of (B) selected GnTHI-com. GnTIII com was obtained through transformation with 
25 human GnTIII gene sequence and selection was performed by lectin blotting using E-PHA. See 
Table 3 for an annotation of the data contained in Figures 9A and 9B. 

Figure 10 shows the full nucleotide sequence of Gw//// without a c-myc tag (SEQ ID NO: 7). 
Figure 1 1 shows a MALDI-TOF mass spectra of glycoproteins from control and GnTIII 
com-2. See, Table 4 for structures and abbreviations. 
30 Figure 12 shows a representative blot of samples of transgenic maize callus for altered lectin 

binding due to expression of the GntQI gene. 

Figure 13 shows a representative blot of samples of transgenic maize callus for c-myc epitope 
expression. 

Figures 14 A and 14B show a MALDI-TOF mass spectra of glycoproteins from (A) control 
35 and (B) GnTIII com plants. 



DEFINITIONS 

The terms "protein" and "polypeptide" refer to compounds comprising amino acids joined via 
peptide bonds and are used mterchangeably. A "protein" or "polypeptide" encoded by a gene is not 
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limited to the amino acid sequence encoded by the gene, but includes post-translational modifications 
of the protein. 

The term "glycoprotein*' refers to proteins with covalently attached sugar units, either bonded 
via the OH group of serine or threonine (O glycosylated) or through the amide NH2 of asparagine (N 
5 glycosylated). "Glycoprotein" may include, but is not limited to^ for example, most secreted proteins 
(serum albumin is the major exception) and proteins exposed at the outer surface of the plasma 
membrane. Sugar residues found include, but are not limited to: mannose, N acetyl glucosamine, N 
acetyl galactosamine, galactose, fiicose and sialic acid. 

Where the term "amino acid sequence" is recited herein to refer to an amino acid sequence of 
10 a protein molecule, "amino acid sequence" and like teims, such as "polypeptide" or ^'protein" are not 
meant to limit the amino acid sequence to the complete, native amino acid sequence associated with 
the recited protein molecule. Furthermore, an "amino acid sequence" can be deduced from the 
nucleic acid sequence encoding the protein. 

The term "portion" when used in reference to a protein (as in "a portion of a given protein") 
15 refers to fragments of that protein. The fragments may range in size from four amino acid residues to 
the entire amino sequence minus one amino acid. 

The term "chimera" when used in reference to a polypeptide refers to the expression product 
of two or more coding sequences obtained from different genes, that have been cloned together and 
that, after translation, act as a single polypeptide sequence. Chimeric polypeptides are also referred 
20 to as "hybrid" polypeptides. The coding sequences includes those obtained from the same or from 
different species of organisms. 

The term "fiision" when used in reference to a polypeptide refers to a chimeric protein 
containing a protein of interest joined to an exogenous protein fragment (the fiision partner). The 
fiision partner may serve various fiinctions, including enhancement of solubility of the polypeptide of 
25 interest, as well as providing an "affinity tag" to allow purification of the recombinant fiision 

polypeptide from a host cell or from a supernatant or from both. If desired, the fiision partner may be 
removed from the protein of interest after or during purification. 

The term "homolog" or "homologous" when used in reference to a polypeptide refers to a 
high degree of sequence identity between two polypeptides, or to a higji degree of similarity between 
30 the three-dimensional structure or to a high degree of sunilarity between the active site and the 
mechanism of action. In a preferred embodiment, a homolog has a greater than 60 % sequence 
identity, and more preferably greater than 75% sequence identity, and still more preferably greater 
than 90 % sequence identity, with a reference sequence. 

As applied to polypeptides, the term "substantial identity" means that two peptide sequences, 
35 when optimally aligned, such as by the programs GAP or BESTFIT using default gap weights, share 
at least 80 percent sequence identity, preferably at least 90 percent sequence identity, more 
preferably at least 95 percent sequence identity or more (e.g., 99 percent sequence identity). 
Preferably, residue positions which are not identical diflFer by conservative amino acid substitutions. 
The terms "variant" and "mutant" when used in reference to a polypeptide refer to an amino 
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acid sequence that differs by one or more amino acids jfrom another, usually related polypeptide. The 
variant may have "conservative" changes, wherem a substituted amino acid has similar structural or 
chemical properties. One type of conservative amino acid substitutions refers to the 
interchangeability of residues having similar side chains. For example, a group of amino acids having 
5 aliphatic side chains is glycine, alanine, valine, leucine, and-isoleucine; a group of amino acids 
having aliphatic-hydroxyl side chains is serine and threonine; a group of amino acids having amide- 
containing side chains is asparagine and glutamine; a group of amino acids having aromatic side 
chains is phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains is 
lysine, arginine, and histidine; and a group of amino acids having sulfur-containing side chains is 
10 cysteine and methionine. Preferred conservative amino acids substitution groups are: valine (V) - 
leucine (L) -isoleucine (I), phenylalanine (F) -tyrosine (Y), lysine (K) -arginine (R), alanine (A) - 
valine (V), and asparagine (N) -glutamme (Q). More rarely, a variant may have "non-conservative" 
changes (e.g., replacement of a glycine with a tryptophan). Similar minor variations may also 
include amino acid deletions or insertions (Le., additions), or botii. Guidance in determining which 
15 and how many amino acid residues may be substituted, inserted or deleted without abolishing 
biological activity may be found using computer programs well known in tiie art, for example, 
DNAStar software. Variants can be tested in fimctional assays. Preferred variants have less than 10 
%, and preferably less than 5 %, and still more preferably less than 2 % changes (whether 
substitutions, deletions, and so on). 
20 The term "domain" when used in reference to a polypeptide refers to a subsection of the 

polypeptide which possesses a unique structural and/or functional characteristic; typically, this 
characteristic is similar across diverse polypeptides. The subsection typically comprises contiguous 
amino acids, although it may also comprise amino acids which act in concert or which are in close 
proximity due to folding or other configurations. 
25 The term "gene" refers to a nucleic acid (e.g., DNA or RNA) sequence that comprises coding 

sequences necessary for the production of an RNA, or a polypeptide or its precursor (e.g., 
proinsulin). A functional polypeptide can be encoded by a full lengtii coding sequence or by any 
portion of the coding sequence as long as tiie desired activity or functional properties (e.g., enzymatic 
activity, ligand binding, signal transduction, etc.) of the polypeptide are retained. The term 
30 "portion" when used in reference to a gene refers to fragments of that gene. The fragments may 
range in size from a few nucleotides (e.g., ten nucleotides) to the entire gene sequence minus one 
nucleotide. Thus, "a nucleotide comprising at least a portion of a gene" may comprise fragments of 
the gene or the entire gene. 

The term "gene" also encompasses the coding regions of a structural gene and includes 
35 sequences located adjacent to the coding region on both the 5* and 3* ends for a distance of about 1 kb 
on either end such that the gene corresponds to tiie length of tiie fiilMength mKNA. The sequences 
which are located 5* of the coding region and which are present on tiie mKNA are referred to as 5' 
non-translated sequences. The sequences which are located 3* or downstream of tiie coding region 
and which are present on tiie mRNA are referred to as 3' non-translated sequences. The term "gene" 
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encompasses both cDNA and genomic forms of a gene. A genomic fonn or clone of a gene contains 
the coding region interrupted with non-coding sequences termed "introns" or "mtervening regions" or 
"intervening sequences." Introns are segments of a gene which are transcribed into nuclear RNA 
(hnRNA); introns may contain regulatory elements such as enhancers. Introns are removed or 
"spliced out" from the nuclear or primary transcript; introns therefore are absent in the messenger 
RNA (mKNA) transcript. The mRNA functions during translation to specify the sequence or order 
of amino acids in a nascent polypeptide. 

In addition to containing mtrons, genomic forms of a gene may also include sequences 
located on both the 5* and 3' end of the sequences which are present on the RNA transcript. These 
sequences are referred to as "flankmg" sequences or regions (these flanking sequences are located 5* 
or 3' to the non-translated sequences present on the mRNA transcript). The 5* flanking region may 
contain regulatory sequences such as promoters and enhancers which control or influence the 
transcription of the gene. The 3* flanking region may contain sequences which direct the termination 
of transcription, posttranscriptional cleavage and polyadenylation. 

The term "heterologous" when used in reference to a gene refers to a gene encoding a factor 
that is not in its natural environment (/.e., has been altered by the hand of man). For example, a 
heterologous gene includes a gene from one species introduced into another species. A heterologous 
gene also includes a gene native to an organism that has been altered in some way (e.g., mutated, 
added in multiple copies, linked to a non-native promoter or enhancer sequence, etc.). Heterologous 
genes may comprise gene sequences that comprise cDNA forms of a gene; the cDNA sequences may 
be expressed in either a sense (to produce mRNA) or anti-sense orientation (to produce an anti-sense 
RNA transcript that is complementary to the mRNA transcript). Heterologous genes are 
distinguished from endogenous genes in that the heterologous gene sequences are typically joined to 
nucleotide sequences comprismg regulatory elements such as promoters that are not found naturally 
associated with the gene for the protein encoded by the heterologous gene or with gene sequences in 
the chromosome, or are associated with portions of the chromosome not found in nature (e.g., genes 
expressed in loci where the gene is not normally expressed). 

A "heterologous glycoprotein" is a glycoprotein originating from a species other than the 
plant host system. The glycoprotein may include but is not limited to antibodies, hormones, growth 
factors, and growth factor receptors, antigens, cytokines and blood products. 

A "plant host system" may include, but is not limited to, a plant or portion thereof which 
includes, but is not limited to, a plant cell, plant organ and/or plant tissue. The plant may be a 
monocotyledon (monocot) which is a flowering plant whose embryos have one cotyledon or seed 
leaf and includes but is not limited to lilies, grasses, com (Zea mays), rice, grains including oats, 
wheat and bariey, orchids, irises, onions and palms. Alternatively, the plant may be a dicotyledenon 
(dicot) which includes, but is not limited to, tobacco (Nicotiam), tomatoes, potatoes, legumes (e.g., 
alfalfa and soybeans), roses, daises, cacti, violets and duckweed. The plant may also be a moss 
which includes, but is not limited to, Physcomitrella patens. The invention is fiirther directed to a 
method for obtaining said bisected GlcNAc in a plant host system by introducing a nucleic acid 
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encoding said GnTHI into a plant or portion thereof and expressing said GnTffl and isolating said 
plant or portion thereof expressing said GnTIII. 

The term "nucleotide sequence of interest" or "nucleic acid sequence of interest" refers to any 
nucleotide sequence (e.g., RNA or DNA), the manipulation of which may be deemed desirable for 
any reason (e.g., treat disease, confer improved qualities, etc.), by one of ordinary skill in the art 
Such nucleotide sequences include, but are not limited to, coding sequences of structural genes (e.g., 
reporter genes, selection marker genes, oncogenes, drug resistance genes, growth factors, etc.), and 
non-coding regulatory sequences which do not encode an mRNA or protein product promoter 
sequence, polyadenylation sequence, termination sequence, enhancer sequence and other like 
sequences). The present mvention contemplates host cells expressing a heterologous protein 
encoded by a nucleotide sequence of interest along with one or more hybrid enzymes. 

The term "structural" when used in reference to a gene or to a nucleotide or nucleic acid 
sequence refers to a gene or a nucleotide or nucleic acid sequence whose ultimate expression product 
is a protein (such as an enzyme or a structural protein), an rRNA, an sRNA, a tRNA, etc. 

The terms "oligonucleotide" or "polynucleotide" or "nucleotide" or "nucleic acid" refer to a 
molecule comprised of two or more deoxyribonucleotides or ribonucleotides, preferably more than 
three, and usually more than ten. The exact size will depend on many factors, which in turn depends 
on the ultimate function or use of the oligonucleotide. The oligonucleotide may be generated in any 
manner, including chemical synthesis, DNA replication, reverse transcription, or a combination 
thereof 

The terms "an oligonucleotide having a nucleotide sequence encoding a gene" or "a nucleic 
acid sequence encoding" a specified polypeptide refer to a nucleic acid sequence comprising the 
coding region of a gene or in other words the nucleic acid sequence which encodes a gene product. 
The coding region may be present in either a cDNA, genomic DNA or KNA form. When present in 
a DNA form, the oligonucleotide may be single-stranded (i.e., the sense strand) or double-stranded. 
Suitable control elements such as enhancers/promoters, splice junctions, polyadenylation signals, etc. 
may be placed in close proximity to the coding region of the gene if needed to permit proper 
initiation of transcription and/or correct processing of the primary RNA transcript. Alternatively, the 
coding region utilized in the expression vectors of the present invention may contain endogenous 
enhancers/jMomoters, splice junctions, intervemng sequences, polyadenylation signals, etc. or a 
combination of both endogenous and exogenous control elements. 

The term "recombinant" when made in reference to a nucleic acid molecule refers to a 
nucleic acid molecule which is comprised of segments of nucleic acid joined together by means of 
molecular biological techniques. The term "recombinant" when made in reference to a protein or a 
polypeptide refers to a protein molecule which is expressed using a recombinant nucleic acid 
molecule. 

As used herein, the terms "complementary" or "complementarity" are used in reference to 
nucleotide sequences related by the base-pairing rules. For eaiample, the sequence 5'-AGT-3' is 
complementary to the sequence 5'-ACT-3'. Complementarity can be "partial" or "total." "Partial" 
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complementarity is where one or more nucleic acid bases is not matched according to the base 
pairing rules. "Total" or "complete" complementarity between nucleic acids is where each and every 
nucleic acid base is matched with another base under the base pairing rules. The degree of 
complementarity between nucleic acid strands has significant effects on the efficiency and strength 
5 of hybridization between nucleic acid strands. 

A "complement" of a nucleic acid sequence as used herein refers to a nucleotide sequence 
whose nucleic acids show total complementarity to the nucleic acids of the nucleic acid sequence. 
For example, the present invention contemplates the complements of SEQ ID NO: 1 . 

The term "homology" when used in relation to nucleic acids refers to a degree of 
10 complementarity. There may be partial homology (i.e., partial identity) or complete homology 
complete identity). A partially complementary sequence is one that at least partially inhibits a 
completely complementary sequence from hybridizing to a target nucleic acid and is referred to using 
the functional terra "substantially homologous." The inhibition of hybridization of the completely 
complementary sequence to the target sequence may be examined using a hybridization assay 
15 (Southern or Northern blot, solution hybridization and the like) under conditions of low stringency. 
A substantially homologous sequence or probe (i.e., an oligonucleotide which is capable of 
hybridizing to another oligonucleotide of interest) will compete for and inhibit the binding (/.c., the 
hybridization) of a completely homologous sequence to a target under conditions of low stringency. 
This is not to say that conditions of low stringency are such that non-specific binding is permitted; 
20 low stringency conditions require that the binding of two sequences to one another be a specific (i.e. , 
selective) interaction. The absence of non-specific binding may be tested by the use of a second 
target which lacks even a partial degree of complementarity (e.g., less than about 30 % identity); in 
tihie absence of non-specific binding the probe will not hybridize to tiie second non-complementaiy 
target 

25 When used in reference to a double-stranded nucleic acid sequence such as a cDNA or 

genomic clone, the term "substantially homologous" refers to any probe which can hybridize to either 
or botii strands of the double-stranded nucleic acid sequence under conditions of low stringency as 
described infra. 

When used m reference to a single-stranded nucleic acid sequence, the term "substantially 
30 homologous" refers to any probe which can hybridize to the single-stranded nucleic acid sequence 
under conditions of low stringency as described infra. 

The following terms are used to describe the sequence relationships between two or more 
polynucleotides: "reference sequence,*' "sequence identity," "percentage of sequence identity" and 
"substantial identity." A "reference sequence" is a defined sequence used as a basis for a sequence 
35 comparison; a reference sequence may be a subset of a larger sequence, for example, as a segment of 
a full-length cDNA sequence given in a sequence listing or may comprise a complete gene sequence. 
Generally, a reference sequence is at least 20 nucleotides in lengtii, fi-equently at least 25 nucleotides 
in length, and often at least 50 nucleotides in length. Since two polynucleotides may each (1) 
comprise a sequence (i.e., a portion of the complete polynucleotide sequence) that is similar between 
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the two polynucleotides, and (2) may further comprise a sequence that is divergent between the two 
polynucleotides, sequence comparisons between two (or more) polynucleotides are typically 
performed by comparing sequences of the two polynucleotides over a "comparison window" to 
identify and compare local regions of sequence similarity. A "comparison window " as used herein, 

5 refers to a conceptual segment of at least 20 contiguous nucleotide positions wherein a 

polynucleotide sequence may be compared to a reference sequence of at least 20 contiguous 
nucleotides and wherein the portion of the polynucleotide sequence in the comparison window may 
comprise additions or deletions gaps) of 20 percent or less as compared to the reference 
sequence (which does not comprise additions or deletions) for optimal alignment of the two 

1 0 sequences. Optimal alignment of sequences for aligning a comparison window may be conducted by 
the local homology algorithm of Smith and Waterman [Smith and Waterman, Adv. Appl Math 2: 
482 (1981)] by the homology alignment algorithm of Needleman and Wunsch [Needleman and 
Wunsch, J. Mol Biol 48:443 (1970)], by the search for similarity method of Pearson and Lipman 
[Pearson and Lipman, Proc. Natl Acad, Set (U.SA,) 85:2444 (1988)], by computerized 

15 implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsm 
Genetics Software Package Release 7.0, Genetics Computer Group, 575 Science Dr., Madison, 
Wis.), or by inspection, and the best alignment (i.e., resulting m the highest percentage of homology 
over the comparison window) generated by the various methods is selected. The term "sequence 
identity" means that two polynucleotide sequences are identical (i.e., on a nucleotide-by-nucleotide 

20 basis) over the wmdow of comparison. The term "percentage of sequence identity" is calculated by 
comparing two optimally aligned sequences over the window of comparison, determining the 
number of positions at which the identical nucleic acid base (e.g.. A, T, C, G, U, or I) occurs in both 
sequences to yield the number of matched positions, dividing die number of matched positions by the 
total number of positions in the window of comparison (/>., the window size), and multiplying the 

25 result by 1 00 to yield the percentage of sequence identity. The terms "substantial identity" as used 
herein denotes a characteristic of a polynucleotide sequence, wherein the polynucleotide comprises a 
sequence that has at least 85 percent sequence identity, preferably at least 90 to 95 percent sequence 
identity, more usually at least 99 percent sequence identity as compared to a reference sequence over 
a comparison window of at least 20 nucleotide positions, frequently over a window of at least 25-50 

30 nucleotides, wherein the percentage of sequence identity is calculated by comparing the reference 
sequence to the polynucleotide sequence which may mclude deletions or additions which total 20 
percent or less of the reference sequence over the window of comparison. The reference sequence 
may be a subset of a larger sequence, for example, as a segment of the full-length sequences of the 
compositions claimed in the present invention. 

35 The term "hybridization" refers to the pairing of complementary nucleic acids. Hybridization 

and the strength of hybridization (/.e., the strength of the association between the nucleic acids) is 
impacted by such factors as the degree of complementary between the nucleic acids, stringency of 
the conditions involved, the Tm of the formed hybrid, and the G:C ratio within the nucleic acids. A 
single molecule that contains pairing of complementary nucleic acids within its structure is said to be 
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"self-hybridized." 

The term "Tm" refers to the "melting temperature" of a nucleic acid. The melting temperature 
is the temperature at which a population of double-stranded nucleic acid molecules becomes half 
dissociated into single strands. The equation for calculating the T^ of nucleic acids is well known m 

5 the art. As indicated by standard references, a simple estimate of the Tm value may be calculated by 
the equation: Tm = 81.5 + 0.41(% G + C), when a nucleic acid is in aqueous solution at 1 M NaCl 
(See e.g., Anderson and Young, Quantitative Filter Hybridization, in Nucleic Acid Hybridization 
[1 985]). CMher references include more sophisticated computations that take structural as well as 
sequence characteristics into account for the calculation of Tm- 

10 The term "stringency" refers to the conditions of temperature, ionic strength, and the 

presence of other compounds such as organic solvents, under which nucleic acid hybridizations are 
conducted. With "high stringency" conditions, nucleic acid base pairing will occur only between 
nucleic acid fragments that have a high frequency of complementary base sequences. Thus, 
conditions of "low" stringency are often required with nucleic acids that are derived from organisms 

1 5 that are genetically diverse, as the frequency of complementary sequences is usually less. 

"Low stringency conditions" when used in reference to nucleic acid hybridization comprise 
conditions equivalent to binding or hybridization at 42 °C in a solution consisting of 5X SSPE (43.8 
g/1 NaCl, 6.9 g/1 NaH2P04(H20 and 1 .85 g/l EDTA, pH adjusted to 7.4 with NaOH), 0.1% SDS, 5X 
Denhardt's reagent [SOX Denhardf s contains per 500 ml: 5 g FicoU (Type 400, Pharmacia), 5 g BSA 

20 (Fraction V; Sigma)] and 100 ]Xg/ml denatured salmon sperm DNA followed by washing in a 

solution comprising 5X SSPE, 0.1% SDS at 42 when a probe of about 500 nucleotides in length 
is employed. 

"Medium stringency conditions" when used in reference to nucleic acid hybridization 
comprise conditions equivalent to binding or hybridization at 42 in a solution consisting of 5X 
25 SSPE (43.8 g/1 NaCl, 6.9 g/1 NaH2P04(H20 and 1 .85 g/1 EDTA, pH adjusted to 7.4 with NaOH), 0.5 
% SDS, 5X Denhardt's reagent and 100 ng/ml denatured salmon sperm DNA followed by washing in 
a solution comprising 1 .OX SSPE, 1 .0 % SDS at 42 when a probe of about 500 nucleotides in 
length is employed. 

"High stringency conditions" when used in reference to nucleic acid hybridization comprise 
30 conditions equivalent to binding or hybridization at 42 in a solution consisting of 5X SSPE (43.8 
g/1 NaCl, 6.9 g/1 NaH2P04(H20 and 1 .85 g/1 EDTA, pH adjusted to 7.4 with NaOH), 0.5 % SDS, 5X 
Denhardt's reagent and 100 p.g/ml denatured salmon sperm DNA followed by washing in a solution 
comprising O.IX SSPE, 1 .0 % SDS at 42 ^C when a probe of about 500 nucleotides in length is 
employed. 

35 It is well known that numerous equivalent conditions may be employed to comprise low 

stringency conditions; factors such as the length and nature (DNA, RNA, base composition) of the 
probe and nature of the target (DNA, RNA, base composition, present in solution or immobilized, 
etc.) and the concentration of the salts and other components (e.g., the presence or absence of 
formamide, dextran sulfate, polyethylene glycol) are considered and the hybridization solution may 
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be varied to generate conditions of low stringency hybridization different from, but equivalent to, the 
above listed conditions. In addition, the art knows conditions that promote hybridization under 
conditions of high stringency (e.g., increasing the temperature of the hybridization and/or wash steps, 
the use of formamide in the hybridization solution, etc.). 
5 Additionally, the term "equivalent," when made in reference to a hybridization condition as it 

relates to a hybridization condition of mterest, means that the hybridization condition and the 
hybridization condition of interest result in hybridization of nucleic acid sequences which have the 
same range of percent (%) homology. For example, if a hybridization condition of interest results in 
hybridization of a first nucleic acid sequence with other nucleic acid sequences that have from 50 % 
10 to 70 % homology to the first nucleic acid sequence, then another hybridization condition is said to 
be equivalent to the hybridization condition of interest if this other hybridization condition also 
results in hybridization of the first nucleic acid sequence with the other nucleic acid sequences that 
have from 50 % to 70 % homology to the first nucleic acid sequence. 

When used in reference to nucleic acid hybridization the art knows well that numerous 
15 equivalent conditions may be employed to comprise either low or high stringency conditions; factors 
such as the length and nature (DNA, RNA, base composition) of tiie probe and nature of the target 
(DNA, RNA, base composition, present in solution or immobilized, etc.) and the concentration of the 
salts and other components (e.g., the presence or absence of formamide, dextran sulfate, polyethylene 
glycol) are considered and tfie hybridization solution may be varied to generate conditions of either 
20 low or high stringency hybridization different from, but equivalent to, the above-listed conditions. 
The term "wild-type" when made in reference to a gene refers to a gene that has tiie 
characteristics of a gene isolated from a naturally occurring source. The term "wild-type" when 
made in reference to a gene product refers to a gene product that has the characteristics of a gene 
product isolated from a naturally occurrmg source. The term "naturally-occurring" as applied to an 
25 object refers to the fact that an object can be found in nature. For example, a polypeptide or 

polynucleotide sequence that is present in an organism (includmg viruses) that can be isolated from a 
source in nature and which has not been intentionally modified by man in the laboratory is naturally- 
occurring. A wild-type gene is frequently that gene which is most frequently observed in a 
population and is thus arbitrarily designated tiie "normal" or "wild-type" form of the gene. In 
30 contrast, the term "modified" or "mutant" when made in reference to a gene or to a gene product 
refers, respectively, to a gene or to a gene product which displays modifications in sequence and/or 
fiinctional properties (z.e., altered characteristics) when compared to the wild-type gene or gene 
product. It is noted that naturally-occurring mutants can be isolated; these are identified by tiie fact 
that they have altered characteristics when compared to the wild-type gene or gene product, 
35 Thus, the terms "variant" and "mutant" when used in reference to a nucleotide sequence refer 

to an nucleic acid sequence that differs by one or more nucleotides from anotiier, usually related 
nucleotide acid sequence. A "variation" is a difference between two different nucleotide sequences; 
typically, one sequence is a reference sequence. 

The term "polymorphic locus" refers to a genetic locus present in a population that shows 
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variation between members of the population {i.e., the most common allele has a frequency of less 
than 0.95). Thus, "polymorphism" refers to the existence of a character in two or more variant forms 
in a population. A "single nucleotide polymorphism" (or SNP) refers a genetic locus of a single base 
which may be occupied by one of at least two different nucleotides. In contrast, a "monomoiphic 
locus" refers to a genetic locus at which little or no variations are seen-between members of the 
population (generally taken to be a locus at which the most common allele exceeds a frequency of 
0.95 in the gene pool of the population). 

A "frameshift mutation" refers to a mutation in a nucleotide sequence, usually resulting from 
msertion or deletion of a single nucleotide (or two or four nucleotides) which results in a change in 
the correct reading frame of a structural DNA sequence encoding a protein. The altered reading 
frame usually results in the translated amino-acid sequence being changed or truncated. 

A "splice mutation" refers to any mutation that affects gene expression by affecting correct 
RNA splicing. Splicing mutation may be due to mutations at intron-exon boundaries which alter 
splice sites. 

The term "detection assay" refers to an assay for detecting the presence or absence of a 
sequence or a variant nucleic acid sequence {e.g., mutation or polymorphism in a given allele of a 
particular gene, as e.g., GnTIII gene, SEQ ID NO: 1, Figure 4A), or for detecting the presence or 
absence of a particular protein {e.g., GnTffl, SEQ ID NO: 2, Figure 4B) or the structure or activity or 
effect of a particular protein {e.g., GnTffl activity), for detecting glycosylation moieties on a 
particular protein (e.g., N-linked glycans) or for detecting the presence or absence of a variant of a 
particular protein. 

The term "antisense" refers to a deoxyribonucleotide sequence whose sequence of 
deoxyribonucleotide residues is in reverse 5* to 3' orientation in relation to the sequence of 
deoxyribonucleotide residues in a sense strand of a DNA duplex. A "sense strand" of a DNA duplex 
refers to a strand in a DNA duplex which is transcribed by a cell in its natural state into a "sense 
mKNA." Thus an "antisense" sequence is a sequence having the same sequence as the non-coding 
strand in a DNA duplex. The term "antisense RNA" refers to a KNA transcript that is 
complementary to all or part of a target primary transcript or mKNA and that blocks the expression 
of a target gene by interfering with the processing, transport and/or translation of its primary 
transcript or mKNA. The complementarity of an antisense KNA may be with any part of the specific 
gene transcript, i.e., at the 5' non-coding sequence, 3* non-coding sequence, introns, or the coding 
sequence. In addition, as used herein, antisense RNA may contain regions of ribozyme sequences 
that increase the efficacy of antisense KNA to block gene expression. "Ribozyme" refers to a 
catalytic RNA and includes sequence-specific endoribonucleases. "Antisense inhibition" refers to 
the production of antisense RNA transcripts capable of preventing the expression of the target 
protein. 

"Amplification" is a special case of nucleic acid replication involving template specificity. It 
is to be contrasted with non-specific template replication {i.e., replication that is template-dependent 
but not dependent on a specific template). Template specificity is here distinguished from fidelity of 
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replication (i.e., synthesis of the proper polynucleotide sequence) and nucleotide (ribo- or deoxyribo- 
) specificity. Template specificity is frequently described in terms of "target" specificity. Target 
sequences are "targets" in the sense that they are sought to be sorted out from other nucleic acid. 
Amplification techniques have been designed primarily for this sorting out. 

5 Template specificity is achieved in most-amplification techniques by the choice of enzyme. 

Amplification enzymes are enzymes that, under conditions they are used, will process only specific 
sequences of nucleic acid in a heterogeneous mixture of nucleic acid. For example, in the case of Q3 
repiicase, MDV-1 KNA is the specific template for the replicase (Kacian et al, Proc. Natl. Acad. Sci. 
USA, 69:3038, 1972). Other nucleic acid will not be replicated by this amplification enzyme. 

10 Similarly, in the case of T7 RNA polymerase, this amplification enzyme has a stringent specificity 
for its own promoters (Chamberlain et al. Nature, 228:227, 1970). In the case of T4 DNA ligase, the 
enzyme will not ligate the two oligonucleotides or polynucleotides, where there is a mismatch 
between the oligonucleotide or polynucleotide substrate and the template at the ligation junction (Wu 
and Wallace, Genomics, 4:560, 1989). Finally, Taq and Pfu polymerases, by virtue of their ability to 

15 function at high temperature, are found to display high specificity for the sequences bounded and 
thus defined by the primers; the high temperature results in thermodynamic conditions that favor 
primer hybridization with the target sequences and not hybridization with non-target sequences (HA. 
Erlich (ed.), PCR Technology, Stockton Press, 1989). 

The term "amplifiable nucleic acid" refers to nucleic acids that may be amplified by any 

20 amplification method. It is contemplated that "amplifiable nucleic acid" will usually comprise 
"sample template." 

The term "sample template" refers to nucleic acid originating from a sample that is analyzed 
for the presence of "target" (defmed below). In contrast, "background template" is used m reference 
to nucleic acid other than sample template that may or may not be present in a sample. Background 

25 template is most often inadvertent. It may be the result of carryover, or it may be due to the presence 
of nucleic acid contaminants sought to be purified away from the sample. For example, nucleic acids 
from organisms other than those to be detected may be present as background m a test sample. 

The term "primer" refers to an oligonucleotide, whether occuning naturally as in a purified 
restriction digest or produced synthetically, which is capable of actuig as a point of initiation of 

30 synthesis when placed under conditions in which syntiiesis of a primer extension product which is 
complementary to a nucleic acid strand is induced, (i.e., in the presence of nucleotides and an 
inducing agent such as DNA polymerase and at a suitable temperature and pH). The primer is 
preferably single stranded for maximum efficiency in amplification, but may alternatively be double 
stranded. If double stranded, tiie primer is first treated to separate its strands before being used to 

35 prepare extension products. Preferably, the primer is an oligodeoxyribonucleotide. The primer must 
be sufficientiy long to prime the synthesis of extension products in the presence of the inducing 
agent. The exact lengtiis of the primers will depend on many factors, including temperature, source 
of primer and the use of the method. 

The term "probe" refers to an oligonucleotide (/.e., a sequence of nucleotides), whether 
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occuiring naturally as in a purified restriction digest or produced synthetically, recombinantly or by 
PCR amplification, that is capable of hybridizing to another oligonucleotide of interest. A probe may 
be single-stranded or double-stranded. Probes are useful in the detection, identification and isolation 
of particular gene sequences. It is contemplated that any probe used in the present invention will be 
labeled with any "reporter molecule," so that is detectable in any detection system, including, but not 
limited to enzyme (e.g., ELISA, as well as enzyme-based histochemical assays), fluorescent, 
radioactive, and luminescent systems. It is not intended that Ae present invention be limited to any 
particular detection system or label. 

The tem "target," when used m reference to the polymerase chain reaction, refers to the 
region of nucleic acid bounded by the primers used for polymerase chain reaction. Thus, the "target" 
is sought to be sorted out from other nucleic acid sequences. A "segment" is defined as a region of 
nucleic acid within the target sequence. 

The term "polymerase chain reaction" ("PCR") refers to the method of K.B. Mullis U.S. 
Patent Nos. 4,683,195, 4,683,202, and 4,965,188, that describe a method for increasing the 
concentration of a segment of a target sequence in a mixture of genomic DNA without cloning or 
purification. This process for amplifying the target sequence consists of mtroducing a large excess of 
two oligonucleotide primers to the DNA mixture containing the desired target sequence, followed by 
a precise sequence of thermal cycling in the presence of a DNA polymerase. The two primers are 
complementary to their respective strands of the double stranded target sequence. To effect 
amplification, the mixture is denatured and the primers then annealed to their complementaiy 
sequences within tfie target molecule. Following annealing, the primers are extended with a 
polymerase so as to form a new pair of complementary strands. The steps of denaturation, primer 
annealing, and polymerase extension can be repeated many times (i.e., denaturation, annealing and 
extension constitute one "cycle"; tiiere can be numerous "cycles") to obtain a high concentration of 
an amplified segment of the desired target sequence. The lengtii of the amplified segment of tiie 
desired target sequence is determined by the relative positions of the primers with respect to each 
otiier, and therefore, this length is a controllable parameter. By virtue of the repeating aspect of tiie 
process, the method is referred to as the "polymerase chain reaction" (hereinafter "PCR"). Because 
tiie desired amplified segments of tiie target sequence become tfie predominant sequences (in terms 
of concentration) in tiie mixture, they are said to be "PCR amplified." 

With PCR, it is possible to amplify a single copy of a specific target sequence in genomic 
DNA to a level detectable by several different metiiodologies (e.g., hybridization witii a labeled 
probe; incorporation of biotinylated primers followed by avidin-enzyme conjugate detection; 
incorporation of ^^P-labeled deoxynucleotide triphosphates, such as dCTP or dATP, into tiie 
amplified segment). In addition to genomic DNA, any oligonucleotide or polynucleotide sequence 
can be amplified witii tiie appropriate set of primer molecules. In particular, tiie amplified segments 
created by the PCR process itself are, themselves, efficient templates for subsequent PCR 
amplifications. 

The terms "PCR producC "PCR fi:agment," and "amplification product" refer to tiie resultant 
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mixture of compounds after two or more cycles of the PCR steps of denaturation, annealing and 
extension are complete. These terms encompass the case where there has been amplification of one 
or more segments of one or more target sequences. 

The term "amplification reagents" refers to those reagents (deoxyribonucleotide 
triphosphates, buflfer, etc.), needed for amplification except for primers, nucleic acid template, and 
the amplification enzyme. Typically, amplification reagents along with other reaction components 
are placed and contained in a reaction vessel (test tube, microwell, etc.). 

The term "reverse-transcriptase" or "RT-PCR" refers to a type of PCR where the starting 
material is mRNA. The starting mRNA is enzymatically converted to complementary DNA or 
"cDNA" using a reverse transcriptase enzyme. The cDNA is then used as a "template" for a "PCR" 
reaction 

The term "gene expression" refers to the process of converting genetic information encoded 
m a gene into RNA (e.g., mRNA, rRNA, tRNA, or snRNA) through "transcription" of the gene (i.e., 
via the enzymatic action of an RNA polymerase), and into protein, through "translation" of mRNA. 
Gene expression can be regulated at many stages in the process. "Up-regulation" or "activation" 
refers to regulation that increases the production of gene expression products (/.e., RNA or protein), 
while "down-regulation" or "repression" refers to regulation that decrease production. Molecules 
(e.g., transcription factors) that are involved in up-regulation or down-regulation are often called 
"activators" and "repressors," respectively. 

The terms "in operable combination," "in operable order" and "operably linked" refer to the 
linkage of nucleic acid sequences in such a manner that a nucleic acid molecule capable of directing 
tiie transcription of a given gene and/or the synthesis of a desired protein molecule is produced. The 
term also refers to the Imkage of amino acid sequences in such a manner so that a functional protein 
is produced. 

The term "regulatory element" refers to a genetic element which controls some aspect of the 
expression of nucleic acid sequences. For example, a promoter is a regulatory element which 
facilitates the initiation of transcription of an operably linked coding region. Other regulatory 
elements are splicing signals, polyadenylation signals, termination signals, etc. 

Transcriptional control signals in eukaryotes comprise "promoter" and "enhancer" elements. 
Promoters and enhancers consist of short arrays of DNA sequences that interact specifically with 
cellular proteins involved in transcription (Maniatis, et al. Science 236:1237, 1987). Promoter and 
enhancer elements have been isolated fi:om a variety of eukaryotic sources including genes in yeast, 
insect, mammalian and plant cells. Promoter and enhancer elements have also been isolated firom 
viruses and analogous control elements, such as promoters, are also found in prokaryotes. The 
selection of a particular promoter and enhancer depends on the cell type used to express tiie protein 
of interest. Some eukaryotic promoters and enhancers have a broad host range while otiiers are 
functional in a limited subset of cell types (for review, see Voss, et al. Trends Biochem. Sci., 1 1 :287, 
1986; and Maniatis, et al,, supra 1987). 

The terms "promoter element," "promoter" or "promoter sequence" refer to a DNA sequence 
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that is located at the 5' end (i.e. precedes) of the coding region of a DNA polymer. The location of 
most promoters known in nature precedes the transcribed region. The promoter functions as a 
switch, activating the expression of a gene. If the gene is activated, it is said to be transcribed, or 
participating in transcription. Transcription involves the synthesis of mKNA from the gene. The 
5 promoter, therefore, serves as a transcriptional regulatory element and also provides a site for 
initiation of transcription of the gene into mRNA. 

Promoters may be tissue specific or cell specific. The tem "tissue specific" as it applies to a 
promoter refers to a promoter that is capable of directing selective esqpression of a nucleotide 
sequence of interest to a specific type of tissue {e.g., petals) in the relative absence of expression of 
10 the same nucleotide sequence of interest in a diflTerent type of tissue (e.g., roots). Tissue specificity 
of a promoter may be evaluated by, for example, operably linking a reporter gene to the promoter 
sequence to generate a reporter construct, introducing the reporter construct into the genome of a 
plant such that the reporter construct is integrated into every tissue of the resulting transgenic plant, 
and detecting the expression of the reporter gene (e.g, detecting mRNA, protein, or the activity of a 
1 5 protein encoded by the reporter gene) in different tissues of the transgenic plant. The detection of a 
greater level of expression of the reporter gene in one or more tissues relative to the level of 
expression of the reporter gene in other tissues shows that the promoter is specific for the tissues in 
which greater levels of expression are detected. The term "cell type specific" as applied to a 
promoter refers to a promoter which is capable of directing selective expression of a nucleotide 
20 sequence of interest in a specific type of cell in the relative absence of expression of the same 

nucleotide sequence of interest in a different type of cell within the same tissue. The term "cell type 
specific" when applied to a promoter also means a promoter capable of promoting selective 
expression of a nucleotide sequence of mterest in a region within a single tissue. Cell type specificity 
of a promoter may be assessed using methods well known in the art, e.g., inununohistochemical 
25 staining. Briefly, tissue sections are embedded in parafFm, and paraffin sections are reacted with a 
primary antibody which is specific for the polypeptide product encoded by the nucleotide sequence 
of interest whose expression is controlled by the promoter. A labeled (e.g., peroxidase conjugated) 
secondary antibody which is specific for the primary antibody is allowed to bind to the sectioned 
tissue and specific bmding detected (e.g., with avidin/biotin) by microscopy. 
30 Promoters may be constitutive or regulatable. The term "constitutive" when made in 

reference to a promoter means that the promoter is capable of directing transcription of an operably 
linked nucleic acid sequence in the absence of a stimulus (e.g., heat shock, chemicals, light, etc.). 
Typically, constitutive promoters are capable of directing expression of a transgene in substantially 
any cell and any tissue. In contrast, a "regulatable" promoter is one which is capable of directing a 
35 level of transcription of an operably linked nuclei acid sequence in the presence of a stimulus (e.g., 
heat shock, chemicals, light, etc.) which is different from the level of transcription of the operably 
linked nucleic acid sequence in the absence of the stimulus. 

The terms "infecting" and "mfection" with a bacterium refer to co-incubation of a target 
biological sample, (e.g., cell, tissue, etc.) with the bacterium under conditions such that nucleic acid 
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sequences contained within the bacterium aie introduced into one or more cells of the target 
biological sample. 

The tenn "Agrobacterium'' refers to a soil-borne. Gram-negative, rod-shaped 
phytopathogenic bacterium which causes crown gall. The term ''AgrobacterM' includes, but is not 
limited to, the strains Agrobacteriion tumefaciens, (which typically causes crown gall in infected 
plants), and Agrobacterium rhizogens (which causes hairy root disease in infected host plants). 
Infection of a plant cell with Agrobacterium generaUy results in the production of opines (e.g., 
nopaline, agropine, octopme etc.) by the infected cell. Thus, Agrobacterium strams which cause 
production of nopaline (e.g., strain LBA4301, C58. A208) are referred to as "nopaline-type" 
Agrobacteria; Agrobacterium strains which cause production of octopine (e.g., strain LBA4404, 
Ach5, B6) are referred to as "octopine-type" Agrobacteria; and Agrobacterium strams which cause 
production of agropine (e.g., strain EHA105, EHAlOl, A281) are referred to as "agropine-type" 
Agrobacteria. 

The term "regulatory region" refers to a gene's 5^ transcribed but untranslated regions, located 
immediately downstream from the promoter and ending just prior to the translational start of the 
gene. 

The term "promoter region" refers to the region immediately upstream of the coding region 
of a DNA polymer, and is typically between about 500 bp and 4 kb in lengdi, and is preferably about 
1 to 1.5 kb in length. 

In contrast, an "mducible" promoter is one which is capable of directing a level of 
transcription of an operably linked nucleic acid sequence in the presence of a stimulus (e.g., heat 
shock, chemicals, light, etc.) vAnch is different from the level of transcription of the operably linked 
nucleic acid sequence in the absence of the stimulus. 

The term "regulatory element" refers to a genetic element that controls some aspect of the 
expression of nucleic acid sequence(s). For example, a promoter is a regulatory element that 
facilitates the initiation of transcription of an operably linked coding region. Other regulatory 
elements are splicing signals, polyadenylation signals, termination signals, etc. 

The enhancer and/or promoter may be "endogenous" or "exogenous" or "heterologous." An 
"endogenous" enhancer or promoter is one that is naturally linked with a given gene in the genome. 
An "exogenous" or "heterologous" enhancer or promoter is one that is placed in juxtaposition to a 
gene by means of genetic manipulation (i.e., molecular biological techniques) such that transcription 
of the gene is directed by the linked enhancer or promoter. For example, an endogenous promoter in 
operable combination with a first gene can be isolated, removed, and placed in operable combination 
with a second gene, thereby making it a "heterologous promoter" in operable combination with the 
second gene. A variety of such combinations are contemplated (e.g., the first and second genes can 
be from the same species, or from different species). 

The term "naturally linked" or "naturally located" when used in reference to the relative 
positions of nucleic acid sequences means that the nucleic acid sequences exist in nature in the 
relative positions. 
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The presence of "qjlicing signals" on an ejqjression vector often results in higher levels of 
expression of the recombinant transcript in eukaryotic host cells. Splicing signals mediate the 
removal of introns from the primary RNA transcript and consist of a splice donor and acceptor site 
(Sambrook, et al. Molecular Cloning: A Laboratory Manual, 2nd ed.. Cold Spring Harbor 
Laboratory Press, New York [1989] pp. 16.7-16.8). A commonly used^lice donor and acceptor site 
is the splice junction from the 16S RNA of SV40. 

Efficient expression of recombinant DNA sequences in eukaryotic ceUs requires expression 
of signals directmg tiie efficient termination and polyadenylation of the resulting transcript 
Transcription termination signals are generally found downstream of the polyadenylation signal and 
are a few hundred nucleotides in length. ITie term "poly(A) site" or "poly(A) sequence" as used 
herein denotes a DNA sequence which directs both the termination and polyadenylation of the 
nascent RNA transcript. Efficient polyadenylation of the recombinant transcript is desirable, as 
transcripts lacking a poly(A) tail are unstable and are rapidly degraded. The poly(A) signal utilized 
in an expression vector may be "heterologous" or "endogenous." An endogenous poly(A) signal is 
one that is found naturally at the 3' end of the coding region of a given gene in the genome. A 
heterologous poly(A) signal is one which has been isolated from one gene and positioned 3' to 
another gene. A commonly used heterologous poly(A) signal is the SV40 poly(A) signal. The SV40 
poly(A) signal is contained on a 237 bp BammiBcR restriction fragment and directs both termination 
and polyadenylation (Sambrook, supra, at 16.6-16.7). 

The term "vector" refers to any genetic element, such as a plasmid, phage, transposon, 
cosmid, chromosome, retrovirus, virion, or similar genetic element, which is capable of replication 
y*ien associated with the proper control elements and which can transfer gene sequences into cells 
and/or between cells. Thus, this term includes cloning and expression vehicles, as well as viral 
vectors. 

The term "expression vector" as used herein refers to a recombinant DNA molecule 
containing a desired coding sequence (or coding sequences) - such as Ihe coding sequence(s) for the 
hybrid enzyme(s) described in more detail below - and appropriate nucleic acid sequences necessary 
for tiie expression of the operably linked coding sequence in a particular host cell or organism. 
Nucleic acid sequences necessary for expression in prokaryotes usually include a promoter, an 
operator (optional), and a ribosome binding site, often along with other sequences. Eukaryotic cells 
are known to utilize promoters, enhancers, and termination and polyadenylation signals. It is not 
intended that the present invention be limited to particular expression vectors or expression vectors 

with particular elements. 

The term "transfection" refers to the introduction of foreign DNA into cells. Transfection 
may be accomplished by a variety of means known to the art including calcium phosphate-DNA co- 
precipitation, DEAE-dextran-mediated transfection, polybrene-mediated transfection, glass beads, 
electroporation, microinjection, liposome fiision, lipofection, protoplast fusion, viral infection, 
biolistics (i.e., particle bombardment) and the like. 

The term "stable transfection" or "stably transfected" refers to the introduction and 
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integration of foreign DNA into the genome of the transfected cell. The terni "stable transfectant" 
refers to a cell that has stably integrated foreign DNA into the genomic DNA. 

The term "transient transfection" or "transiently transfected" refers to the introduction of 
foreign DNA into a cell where the foreign DNA fails to integrate into the genome of the transfected 
cell. The foreign DNA persists in the nucleus of the transfected cellfor several days. During this 
time the foreign DNA is subject to the regulatory controls that govern the expression of endogenous 
genes m the chromosomes. The term "transient transfectant" refers to cells that have taken up 
fOTeign DNA but have fiiiled to integrate this DNA. 

The term "calcium phosphate co-precipitation" refers to a technique for the introduction of 
nucleic acids into a cell. The uptake of nucleic acids by cells is enhanced when the nucleic acid is 
presented as a calcium phosphate-nucleic acid co-precipitate. The original technique of Graham and 
van der Eb (Graham and van der Eb, Virol, 52:456, 1973), has been modified by several groups to 
optimize conditions for particular types of cells. The art is well aware of these numerous 
modifications. 

The terms "bombarding, "bombardment," and "biolistic bombardment" refer to the process of 
accelerating particles towards a target biological sample (e.g., cell, tissue, etc.) to effect wounding of 
the cell membrane of a cell in the target biological sample and/or entry of the particles into the target 
biolo^cal sample. Methods for biolistic bombardment are known in the art (e.g., U.S. Patent No. 
5,584,807, Ae contents of which are incorporated herein by reference), and are commercially 
available (e.g., the helhan gas-driven microprojectile accelerator (PDS-lOOO/He, BioRad), 

The term "microwounding" when made in reference to plant tissue refers to the introduction 
of microscopic wounds in fliat tissue. Microwounding may be achieved by, for example, particle 
bombardment as described herein. 

The term "plant" as used herein refers to a plurality of plant cells which are largely 
differentiated into a stnicture tiiat is present at any stage of a plant's development. Such structures 
include, but are not limited to, a fhiit, shoot, stem, leaf, flower petal, etc. The term "plant tissue" 
includes differentiated and undififCTcntiated tissues of plants including, but not limited to. roots, 
shoots, leaves, pollen, seeds, tumor tissue and various types of cells in culture (e.g., single cells, 
protoplasts, embryos, callus, protocorm-like bodies, etc.). Plant tissue may be inplanta, in organ 
culture, tissue culture, or cell culture. Similarly, •'plant cell(s)" may be cells in culsture or may be 
part of a plant. 

The term "transgenic" when used in reference to a cell refers to a cell which contains a 
transgene, or whose genome has been altered by the introduction of a transgene. The term 
•transgenic" when used in reference to a cell, tissue or to a plant refers to a cell, tissue or plant, 
respectively, which comprises a transgene, where one or more cells of the tissue contain a transgene 
(such as a gene encoding the hybrid enzyme(s) of the present invention), or a plant whose genome 
has been altered by the introduction of a transgene. Transgenic cells, tissues and plants may be 
produced by several metiiods including the introduction of a "transgene" comprising nucleic acid 
(usually DNA) into a target cell or integration of the transgene into a chromosome of a target cell by 
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way of human intervention, such as by the methods described herein. 

The term "transgene" as used herein refers to any nucleic acid sequence which is introduced 
into the genome of a cell by experimental manipulations. A transgene may be an "endogenous DNA 
sequence," or a "heterologous DNA sequence" (i.e., "foreign DNA"). The tem "endogenous DNA 
sequence" refers to a nucleotide sequence which is naturally found in-the cell into which it is 
introduced so long as it does not contain some modification (e.g., a point mutation, the presence of a 
selectable marker gene, or other like modifications) relative to the naturally-occurring sequence. The 
term 'Tieterologous DNA sequence" refers to a nucleotide sequence which is ligated to, or is 
manipulated to become ligated to, a nucleic acid sequence to which it is not ligated in nature, or to 
vMch it is ligated at a different location in nature. Heterologous DNA is not endogenous to the cell 
into which it is introduced, but has been obtained from another cell. Heterologous DNA also mcludes 
an endogenous DNA sequence which contains some modification. Generally, although not 
necessarily, heterologous DNA encodes RNA and proteins that are not normally produced by the cell 
into which it is expressed. Examples of heterologous DNA include reporter genes, transcriptional 
and translational regulatory sequences, selectable marker proteins (e.g., proteins which confer drug 
resistance), or other similar elements. 

The term "foreign gene" refers to any nucleic acid (e.g., gene sequence) which is introduced 
into the genome of a cell by experimental manipulations and may mclude gene sequences found in 
that cell so long as the introduced gene contains some modification (e.g., a point mutation, the 
presence of a selectable marker gene, or other like modifications) relative to the naturally-occurring 
gene. 

The term "transformation" as used herein refers to the introduction of a transgene into a cell. 
Transformation of a cell may be stable or transient. The term "transient transformation" or 
"transiently transformed" refers to the introduction of one or more transgenes into a cell in the 
absence of integration of the transgene into the host cell's genome. Transient transformation may be 
detected by, for example, enzyme-linked immunosorbent assay (ELISA) which detects the presence 
of a polypeptide encoded by one or more of the transgenes. Alternatively, transient transformation 
may be detected by detecting the activity of the protein (e.g., P-glucuronidase) encoded by the 
transgene {e.g., the uidA gene) as demonstrated herein (e.g., histochemical assay of GUS enzyme 
activity by staining with X-gluc which gives a blue precipitate in the presence of the GUS enzyme; 
and a chemiluminescent assay of GUS enzyme activity using the GUS-Light kit (Tropix)). The term 
"transient transformant" refers to a cell which has transiently incorporated one or more transgenes. 
In contrast, the term "stable transformation" or "stably transformed" refers to the introduction and 
integration of one or more transgenes into the genome of a cell. Stable transformation of a cell may 
be detected by Southern blot hybridization of genomic DNA of the cell with nucleic acid sequences 
which are capable of binding to one or more of the transgenes. Alternatively, stable transformation 
of a cell may also be detected by the polymerase chain reaction of genomic DNA of the cell to 
amplify transgene sequences. The term "stable transformant" refers to a cell which has stably 
integrated one or more transgenes into the genomic DNA. Thus, a stable transformant is 
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distinguished &om a transient transformant in that, whereas genomic DNA from the stable 
transfonnant contains one or more transgenes, genomic DNA from the transient transfomant does 
not contain a transgene. 

The term "host cell" refers to any cell capable of replicating and/or transcribing and/or 
5 translating a heterologous gene. Thus, a-"host cell" refers to any eukaryotic or prokaryotic cell (e.g., 
bacterial cells such as E. colU yeast cells, mammalian cells, avian cells, amphibian cells, plant cells, 
fish cells, and insect cells), whether located in vitro or in vivo. For example, host cells may be located 
in a transgenic animal. 

The terms "transformants" or "transformed cells" include the primary transformed cell and 

10 cultures derived from that cell without regard to the number of transfers. All progeny may not be 
precisely identical in DNA content, due to deliberate or inadvertent mutations. Mutant progeny that 
have the same fimctionality as screened for in the origmally transformed cell are included in the 
definition of transformants. 

The term "selectable marker" refers to a gene which encodes an enzyme having an activity 

15 that confers resistance to an antibiotic or drug upon the cell in which the selectable marker is 
expressed, or which confers expression of a trait which can be detected (e.g.,, luminescence or 
fluorescence). Selectable markers may be "positive" or "negative." Examples of positive selectable 
markers include the neomycin phosphotrasferase (NFIH) gene which confers resistance to G418 and 
to kanamycin, and the bacterial hygromycin phosphotransferase gene Qiyg\ which confers resistance 

20 to the antibiotic hygromycm. Negative selectable markers encode an enzymatic activity whose 

expression is cytotoxic to the cell when grown in an appropriate selective medium. For example, the 
HSV-fJfc gene is commonly used as a negative selectable marker. Expression of the HSV-rt gene in 
cells grown in the presence of gancyclovir or acyclovir is cytotoxic; thus, growth of cells in selective 
medium containing gancyclovir or acyclovir selects against cells capable of expressing a functional 

25 HSVTK enzyme. 

The term "reporter gene" refers to a gene encoding a protein that may be assayed. Examples 
of reporter genes include, but are not limited to, luciferase {See, e.g., deWet et al, Mol Cell Biol 
7:725, 1987 and U.S. PatNos.,6,074,859; 5,976,796; 5,674,713; and 5,618,682; all of which are 
mcorporated herein by reference), green fluorescent protein (e.g., GenBank Accession Number 

30 U43284; a number of GFP variants are commercially available from CLONTECH Laboratories, Palo 
Alto, CA), chloramphenicol acetyltransferase, B-galactosidase, alkaline phosphatase, and horse 
radish peroxidase. 

The term "overexpression" refers to the production of a gene product in transgenic 
organisms that exceeds levels of production in normal or non-transformed organisms. The term 
35 "cosuppression" refers to the expression of a foreign gene which has substantial homology to an 
endogenous gene resulting in the suppression of expression of both the foreign and the endogenous 
gene. As used herein, the term "altered levels" refers to the production of gene product(s) in 
transgenic organisms in amounts or proportions that diifer from that of normal or non-transformed 
organisms. 
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The terms "Southern blot analysis" and "Southern blot" and "Southern" refer to die analysis 
of DNA on agarose or acrylamide gels in which DNA is separated or fragmented according to size 
followed by transfer of the DNA from the gel to a solid support, such as nitrocellulose or a nylon 
membrane. The immobilized DNA is then exposed to a labeled probe to detect DNA species 
5 complementary to the probe used. The DNA may be cleaved with restriction enzymes prior to 
electrophoresis. Following electrophoresis, the DNA may be partially depurinated and denatured 
prior to or during transfer to the solid support. Southern blots are a standard tool of molecular 
biologists (J. Sambrook et al. Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press, 
NY, pp 9.31-9.58, 1989). 

10 The term "Northern blot analysis" and "Northern blot" and "Northern" refer to the analysis of 

RNA by electrophoresis of RNA on agarose gels to fractionates the RNA according to size followed 
by transfer of the KNA from the gel to a solid support, such as nitrocellulose or a nylon membrane. 
The immobilized RNA is then probed with a labeled probe to detect RNA species complementary to 
the probe used. Northern blots are a standard tool of molecular biologists (J. Sambrook, et al, supra, 

15 pp 7.39-7.52, 1989). 

The terms "Western blot analysis" and "Western blot" and "Western" refers to the analysis of 
protein(s) (or polypeptides) immobilized onto a support such as nitroceUulose or a membrane. A 
mixture comprising at least one protein is first separated on an acrylamide gel, and the separated 
proteins are then transferred from the gel to a solid support, such as nitrocellulose or a nylon 

20 membrane. The immobilized proteins are exposed to at least one antibody with reactivity against at 
least one antigen of interest. The bound antibodies may be detected by various methods, including 
tiie use of radiolabeled antibodies. 

The term "antigenic determinant" refers to that portion of an antigen that makes contact with 
a particular antibody (ie., an epitope). When a protein or fragment of a protein is used to immunize 

25 a host animal, numerous regions of the protein may induce the production of antibodies that bind 
specifically to a given region or three-dimensional structure on the protein; these regions or 
structures are referred to as antigenic determinants. An antigenic determinant may compete with the 
intact antigen {i.e., the "immunogen" used to elicit the immune response) for binding to an antibody. 
The term "isolated" when used in relation to a nucleic acid, as in "an isolated nucleic acid 

30 sequence" refers to a nucleic acid sequence that is identified and separated fix>m one or more other 
components (e.g., separated from a cell containing the nucleic acid, or separated from at least one 
contaminant nucleic acid, or separated from one or more proteins, one or more lipids) with which it 
is ordinarily associated in its natural source. Isolated nucleic acid is nucleic acid present in a form or 
setting that is different from that in which it is found in nature. In contrast, non-isolated nucleic acids 

35 are nucleic acids such as DNA and RNA which are found in die state they exist in nature. For 

example, a given DNA sequence (e.g.. a gene) is found on the host cell chromosome in proximity to 
neighboring genes; RNA sequences, such as a specific mRNA sequence encoding a specific protein, 
are found in the cell as a mixture with numerous otiier mRNAs which encode a multitude of proteins. 
However, an isolated nucleic acid sequence comprising, for example, SEQ ID N0:1 includes, by way 
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of example, such nucleic acid sequences in cells which ordinarily contain, for example, SEQ ID 
N0:1 \**ere the nucleic acid sequence is in a chromosomal or extrachromosomal location different 
from that of natural cells, or is otherwise flanked by a different nucleic acid sequence than that found 
in nature. The isolated nucleic acid sequence may be present in single-stranded or double-stranded 
form. When an isolated nucleic acidsequence is to be utilized to express a protein, the nucleic acid 
sequence will contain at a minimum at least a portion of the sense or coding strand the nucleic 
acid sequence may be single-stranded). Alternatively, it may contain both the sense and anti-sense 
strands (/.e., the nucleic acid sequence may be double-stranded). 

The term "purified" refers to molecules, either nucleic or amino acid sequences, that are 
removed from their natural environment (or components of their natural envuronment), isolated or 
separated. An "isolated nucleic acid sequence" may therefore be a purified nucleic acid sequence. 
"Substantially purified" molecules are at least 60 % free, preferably at least 75 % free, and more 
preferably at least 90 % fi«e from other components with which they are naturally associated. As 
used herein, the term "purified" or "to purify" also refer to the removal of contaminants from a 
sample. The removal of contaminating proteins results in an increase in the percent of polypeptide of 
interest in the sample. In another example, recombinant polypeptides are expressed m plant, 
bacterial, yeast, or mammalian host cells and the polypeptides are purified by the removal of host ceU 
proteins; the percent of recombinant polypeptides is thereby increased in the sample. The present 
invention contemplates both purified (including substantially purifie<0 and unpurified hybrid 
enzyme(s). 

The term "composition comprising" a given polynucleotide sequence or polypeptide refers 
broadly to any composition containing the given polynucleotide sequence or polypeptide. The 
composition may comprise an aqueous solution. Compositions comprising polynucleotide sequences 
encoding GnTIH or fragments thereof may be employed as hybridization probes, hi this case, the 
GnTIII encoding polynucleotide sequences are typically employed in an aqueous solution containing 
salts (e.g., NaCl), detergents {e.g., SDS), and other components (e.g., Denhardt's solution, dry milk, 
sahnon sperm DNA, etc.). 

The term "test compound" refers to any chemical entity, pharmaceutical, drug, and the like 
that can be used to treat or prevent a disease, illness, sickness, or disorder of bodily fimction, or 
otherwise alter the physiolo^cal or ceUular status of a sample. Test compounds comprise both 
known and potential therapeutic compounds. A test compound can be determined to be therapeutic 
by screening usmg the screening methods of the present invention. A "known therapeutic 
compound" refers to a therapeutic compound tiiat has been shown {e.g., through animal trials or prior 
experience with administration to humans) to be effective in such treatment or prevention. 

As used herein, the term "response," when used in reference to an assay, refers to the 
generation of a detectable signal (e.g., accumulation of reporter protein, increase in ion 
concentration, accumulation of a detectable chemical product). 

The term "sample" is used in its broadest sense. In one sense it can refer to a animal cell or 
tissue. In another sense, it is meant to include a specimen or culture obtained fix>m any source, as 
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well as biological and environmental samples. Biological samples may be obtained from plants or 
animals (including humans) and encompass fluids, solids, tissues, and gases. Environmental samples 
include environmental material such as surface matter, soil, water, and industrial samples. These 
examples are not to be construed as limiting the sample types applicable to the present invention. 
5 The term ^fusion protein" refers to a protein wherein-at least one part or portion is from a 

first protein and another part or portion is from a second protein. The term *liybrid enzyme" refers to 
a fusion protein which is a functional en2yme, wherein at least one part or portion is from a first 
species and another part or portion is from a second species. Preferred hybrid enzymes of the 
present invention are functional glycosyltransferases (or portions thereof) wherein at least one part or 
10 portion is from a plant and another part or portion is from a manmial (such as human). 

The tenn "introduction into a cell" m the context of nucleic acid (e.g., vectors) is intended to 
include what the art calls "transformation" or "transfection" or 'transduction." Transformation of a 
cell may be stable or transient - and the present invention contemplates introduction of vectors under 
conditions where, on tiie one hand, there is stable expression, and on the other hand, where tiiere is 
15 only transient expression. The term "transient transformation" or "transientiy transformed" refers to 
the introduction of one or more transgenes into a cell in the absence of mtegration of the transgene 
into tiie host celPs genome. Transient transformation may be detected by, for example, enzyme- 
linked immunosorbent assay (ELISA) which detects the presence of a polypeptide encoded by one or 
more of the transgenes. Alternatively, transient transformation may be detected by detecting tfie 
20 activity of the protein {e.g., antigen binding of an antibody) encoded by the transgene {e.g., the 

antibody gene). The term "transient transformant" refers to a cell which has transientiy incorporated 
one or more transgenes. In contrast, the term "stable transformation" or "stably transformed" refers 
to the introduction and integration of one or more transgenes into the genome of a cell. Stable 
transformation of a cell may be detected by Southern blot hybridization of genomic DNA of tiie cell 
25 with nucleic acid sequences which are capable of binding to one or more of the transgenes. 

Alternatively, stable transformation of a cell may also be detected by tiie polymerase chain reaction 
(PCR) of genomic DNA of the cell to amplify transgene sequences. The term "stable transformant" 
refers to a cell which has stably integrated one or more transgenes into the genomic DNA. Thus, a 
stable transformant is distinguished from a transient transformant in that, whereas genomic DNA 
30 from the stable transformant contains one or more transgenes, genomic DNA from the transient 
transformant does not contain a transgene. 

"Bisected oligosaccharide" shall be defined as an oligosaccharide comprising, e.g., two 
mannose groups and anotiier saccharide moitey attached to a mannose residue of tiie oligosaccharide. 
Examples of bisected oligonucleotides are given in Table 1. 

35 

DETAILED DESCRIPTION OF THE INVENTION 

The GnTm (for example, SEQ ID NO: 1, Figure 4A) expressed in tiie plant host cell of tiie 
present invention is a mammalian GnTIIL In a specific embodiment, tiie GnTffl is a human GnTHI 
(for example, SEQ ID NO: 2, Figure 4B). The GnTIII may also in a specific embodiment have 80 % 
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identity with the amino acid sequence of a raammaUan GnTIU, more preferably at least about 90 %, 
even more preferably at least about 95 %, and most preferably at least about 97 % (hereinafter 
"homologous polypeptides"), which qualitative retain the activity of said mammalian GnTIII. A 
polypeptide that has an amino acid sequence at least, for example, 95 % "identical" to a query amino 
acid-sequence is identical to the query sequence except that the subject polypeptide sequence may 
include on average, up to five amino acid alterations per each 100 amino acids of the query amino 
acid sequence. In other words, to obtain a polypeptide having an amino acid sequence at least 95 % 
idwitical to a query amino acid sequence, up to 5 % of the amino add residues in the subject 
sequence may be inserted, deleted or substituted with another amino acid. These alterations of the 
reference sequence m^ occur at the amino or carboxy terminal positions of the reference amino acid 
sequence or anywhere between those terminal positions, interspersed either individually among 
residues in the referenced sequence or in one or more contiguous groups within the reference 
sequence. 

A preferred method for determining the best overall match between a query sequence (a 
sequence of the present invention) and a subject sequence, also referred to as a global sequence 
alignment, can be determined using the FASTDB computer program based on the algorithm of 
Brutlag, etai (Com. App. Biosci. 6:237-245, 1990). In a sequence alignment, the query and subject 
sequence are either both nucleotide sequences or both amino acid sequences. The result of said 
global sequence alignment is in percent identity. Preferred parameters used in a FASTDB ammo 
acid alignment are: Matrix=PAM 0, k-tuple=2. Mismatch Penalty=l, Joining Penalty=20, 
Randomization Group Length=0, Cutoff Score= 1, Window Size=sequence length, Gap Penal^S, 
Gap Size Penalty=0.05, Window Size=500 or the lengfli of the subject amino acid sequence, 
whichever is shorter. 

If the subject sequence is shorter than the query sequence due to N- or C- terminal deletions, 
not because of internal deletions, a manual correction must be made to the results. This is because 
the FASTDB program does not account for N- and C- terminal truncations of the subject sequence 
when calculating global percent identity. For subject sequences truncated at the N- and C-termini, 
relative to the query sequence, the percent identity is corrected by calculating the number of residues 
of the query sequence tihat are N- and C-terminal of the subject sequence, which are not 
matched/aligned with a corresponding subject residue, as a percent of the total bases of the query 
sequence. Whether a residue is matched/aUgned is determmed by results of the FASTDB sequence 
alignment This percentage is then subtracted from the percent identity, calculated by the above 
FASTDB program usmg the specified parameters, to arrive at a fmal percent identity score. This 
final percent identity score is what is used for the purposes of the present invention. Only residues to 
the N- and C-termini of the subject sequence, which are not matched/aligned with the query 
sequence, are considered for the purposes of manually adjusting the percent identity score. That is, 
only query residue positions outside the ferfliestN- and C-terminal residues of the subject sequence. 

The GnTIII expressed in the plant host system of the present invention is encoded by a 
nucleic acid sequence that has at least 80 % identity with the nucleic acid sequence encoding an 
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amino acid sequence of a mammalian GnTm, more preferably at least about 90 %, even more 
preferably at least about 95 %, and most preferably at least about 97 % (hereinafter "homologous 
polypeptides"), which qualitative retain the activity of said mammalian GnUII. The nucleic acid 
sequence may be an RNA or DNA sequence. 

5 A polynucleotide having 95 % "identity" to a reference nucleotide sequence of the present 

invention, is identical to the reference sequence except that the polynucleotide sequence may include, 
on average, up to five point mutations per each 100 nucleotides of the reference nucleotide sequence 
encoding the polypeptide. In other words, to obtain a polynucleotide having a nucleotide sequence at 
least 95 % identical to a reference nucleotide sequence, up to 5 % of the nucleotides m the reference 

10 sequence may be deleted or substituted with another nucleotide, or a number of nucleotides up to 5 % 
of the total nucleotides in the reference sequence may be inserted into the reference sequence. The 
query sequence may be an entire sequence, the ORF (open reading frame), or any fragment specified 
as described herein. 

As a practical matter, whether any particular nucleic acid molecule or polypeptide is at least 
1 5 90 %, 95 %, 96 %, 97 %, 98 % or 99 % identical to a nucleotide sequence of the presence invention 
can be determined conventionally using known computer programs. A preferred method for 
determining the best overall match between a query sequence (a sequence of the present mvention) 
and a subject sequence, also referred to as a global sequence alignment, can be determined using the 
FASTDB computer program based on the algorithm of Brutlag, et al, (Comp. App. Biosci. , 6:237- 
20 245, 1990). In a sequence aligimient the query and subject sequences are both DNA sequences. An 
RNA sequence can be compared by converting U's (uridine) to Ts (thymines). The result of said 
global sequence alignment is in percent identity. Preferred parameters used in a FASTDB alignment 
of DNA sequences to calculate percent identity are: Matrix=Unitary, k-tuple=4, Mismatch 
Penalty=l, Joining Penalty=30, Randomization Group Length=0, Cutoff Score=l, Gap Penalty=5, 
25 Gap Size Penalty=0.05, Window Size=500 or the length of the subject nucleotide sequence, 
whichever is shorter. 

The invention also encompasses polynucleotides that hybridize to the nucleic acid sequence 
encoding said mammalian GnTin. A polynucleotide "hybridizes" to another polynucleotide, when a 
single-stranded form of the polynucleotide can anneal to the other polynucleotide under the 

30 appropriate conditions of temperature and solution ionic strength (see, Sambrook, et ah, supra). The 
conditions of temperature and ionic strength detemiine the "stringency" of the hybridization. For 
prelimmary screening for homologous nucleic acids, low stringency hybridization conditions, 
corresponding to a temperature of 42 ^C, can be used, e.g., 5X SSC, 0.1 % SDS, 0.25 % milk, and no 
formamide; or 40 % formamide, 5X SSC, 0.5 % SDS). Moderate stringency hybridization 

35 conditions correspond to a higher temperature of 55 °C, e.g. , 40 % formamide, with 5X or 6X SCC. 
High stringency hybridization conditions correspond to the highest temperature of 65 ^C, e.g., 50 % 
formamide, 5X or 6X SCC. Hybridization requires that the two nucleic acids contain complementary 
sequences, although depending on the stringency of the hybridization, mismatches between bases are 
possible. The appropriate stringency for hybridizing nucleic acids depends on the length of the 
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nucleic acids and the degree of complementation, variables well known in the art. The greater the 
degree of similarity or homology between two nucleotide sequences, the greater the value of 
(melting temperature) for hybrids of nucleic acids having those sequences. The relative stability 
(corresponding to higher Tm) of nucleic acid hybridizations decreases in the following order: 
RNA:RNA, DNA:RNA, DNA:DNA. 

Expression of GnTm and other Heterologous Proteins in Plant Host Systems 

In one embodiment, the nucleic acid encoding the mammalian GnTEI or other heterologous 
proteins, such as a heterologous glycoprotein or mammalian glycosyltransferase may be inserted into 
an appropriate expression vector, Le., a vector which contains the necessary elements for the 
transcription and translation of the inserted coding sequence, or in the case of an RNA viral vector, 
the necessary elements for replication and translation, as well as selectable markers. These include 
but are not limited to a promoter region, a signal sequence, 5' untranslated sequences, initiation 
codon depending upon whether or not the structural gene comes equipped with one, and transcription 
and translation termination sequences. Methods for obtaining such vectors are known in the art (see 
WO 01/29242 for review). 

Promoter sequences suitable for expression in plants are described in the art, e.g., WO 
91/198696. These include non-constitutive promoters or constitutive promoters, such as, the 
nopaline synthetase and octopine synthetase promoters, cauliflower mosaic virus (CaMV) 19S and 
35S promoters and the figwort mosaic virus (FMV) 35 promoter (U.S. Pat No. 6,051,753). 
Promoters used may also be tissue specific promoters targeted for example to the endosperm, 
aleurone layer, embryo, pericarp, stem, leaves, kernels, tubers, roots, etc. 

A signal sequence allows processing and translocation of a protein where appropriate. The 
signal can be derived from plants or could be non-plant signal sequences. The signal peptides direct 
the nascent polypeptide to the endoplasmic reticulum, where the polypeptide subsequently undergoes 
post-translational modification. Signal peptides can routinely be identified by those of skill in the 
art. They typically have a tripartitite structure, witii postively charged amino acids at the N-terminal 
end, followed by a hydrophobic region and then the cleavage site within a region of reduced 
hydrophobicity. 

The transcription termination is routinely at the opposite end from the transcription initiation 
regulatory region. It may be associated with the transcriptional initiation region or from a different 
gene and may be selected to enhance expression. An example is the NOS terminator from 
Agrobacterium Ti plasmid and tfie rice a-amylase terminator. Polyadenylation tails may also be 
added. Examples include but are not limited to Agrobacterium octopine syntiietase signal, (Gielen, et 
al, EMBO J. 3:835-846, 1984) or nopaline synthase of the same species (Depicker, et al, Mol Appl 
Genet 1:561-573,1982). 

Enhancers may be included to increase and/or maximize transcription of the heterologous 
protein. These include, but are not limited to peptide export signal sequence, codon usage, mtrons, 
polyadenylation, and transcription termination sites (see, WO 01 /29242). 
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Markers include herbicide tolerance and prokaryote selectable markers. Such markers 
include resistance toward antibiotics such as ampicillin, tetracycline, kanamycin, and spectinomycin. 
Specific examples include but are not limited to streptomycin phosphotransferase (spt) gene coding 
for streptomycin resistance, neomycin phosphotransferase {nptll) gene encoding kanamycin or 
geneticin resistance, hygromycin jAosphotransferase (hpt) gene encoding resistance to hygromycin. 

The vectors constructed may be introduced into the plant host system using procedures 
known in the art (reviewed in WO 01/29242 and WO 01/31045). The vectors may be modified to 
intermediate plant transformation plasmids that contain a region of homology to an Agrobacterium 
tumefaciem vector, a T-DNA border region from A. tumefaciem. Alternatively, the vectors used in 
the methods of the present invention may be Agrohacterium vectors. Methods for introducing the 
vectors mclude but are not limited to microinjection, velocity ballistic penetration by small particles 
with the nucleic acid either within the matrix of small beads or particles, or on the surface and 
electroporation. The vector may be introduced into a plant cell, tissue or organ. In a specific 
embodunent, once the presence of a heterologous gene is ascertained, a plant may be regenerated 
using procedures known in the art. 

Uses of Mammalian GnTin 

The expression of mammalian GnTIII leads to bisected N-glycans on glycoproteins. 
Bisected N-glycans are important for biological activity of some mammalian glycoproteins. In 
particular, bisected monoclonal antibodies have enhanced ADCC (antibody-dependent cellular 
cytotoxicity). Introduction of bisected structures leads to new or optimized functionalities and 
increased bioavailability of glycoprotein, e.g., mcreasing the antennary type increases half-life 
because of reduced clearance by the kidney. Accordmgly, the invention is directed to a plant host 
system comprising said heterologous glycoprotein having bisecting oligosaccharides, particularly 
bisecting GlcNAc residues and methods for producing said glycoprotein. 

Furthermore, expression of GnTIII in plants leads to drastic increase of terminal GlcNAc's 
compared to wildtype plants (plant N-glycans contain far less GlcNAc residues compared to 
mammalian N-glycans). More GlcNAc residues on N-glycans of plant glycoproteins or recombinant 
glycoprotein produced in plants resembles mammalian N-glycans of glycoproteins. The introduction 
of bisected GlcNAc in plant N-glycans (and m plant-produced recombinant glycoproteins such as 
Mabs) due to GnTffl expression in plants seems to prevent the N-glycan from degradation by p-N- 
acetylhexosaminidases. More GlcNAc residues means more acceptor substrate for p(l,4)- 
galactosyltransferase (GalT) adding terminal galactose. Co-expression of GnTIII and a functional 
protein such as a transporter or a (mammalian) enzyme or functional fragment thereof providing N- 
glycan biosynthesis, such as a galactosyltransferase, such as GAIT, or crossing GnTIII plants with 
GalT plants and vice versa, leads to increased galactosylation of glycoproteins s produced in these 
plants. Accordingly, the invention is directed to a plant host-system comprising s^d mammalian 
GnTIII and said functional protein; the plant host system may further comprise a heterologous 
glycoprotein with increased galactosylation relative to a heterologous glycoprotein produced in a 
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plant host system not comprising said mammalian GnTffl and said functional protein, methods for 
providing said plant host systems and methods for producing said glycoprotein. 

Generatmg stably transformed plants which produce tailored glycoproteins with commercial 
interest can be established by inoculating plant cells or tissues with Agrobacterium strains containing 
5 a vector which comprises both nucleotide sequences encoding GnTffl. optionally N-glycosylation 
modifying enzymes and nucleotide sequences encoding commercially interesting heterologous 
glycoproteins or by the procedures described above such as electroporation, microinjection, velocity 
ballistic penetration by small particles with fte nucleic acid either within the matrix of small beads or 
particles, or on the surface and electroporation. Alternatively, stably transformed plants which 
10 produce tailored glycoproteins with commercial interest can be generated by simultaneous 
inoculation (cotransformation) of two or more Agrobacterium strains each carrying a vector 
comprising either nucleotide sequences encoding GNTin, optionally, other N-glycosylation 
modifying enzymes or nucleotide sequences encoding glycoproteins of commercial interest or direct 
cotransformation of plant cells or tissues with said vectors. Alternatively, stably transformed plants 
1 5 which produce tailored glycoproteins with commercial interest can be generated by (multiple) 

crossing(s) of plants with modified N-glycosylation with plants which express nucleotide sequences 
encoding proteins of commercial interest. In all of these procedures, the vector may also comprise a 
nucleotide sequence which confers resistance against a selection agent 

In order to obtain satisfectoiy expression of the proteins involved in N-glycosylation, GnTffl 
20 and of the glycoproteins or polypeptides of commercial interest, the nucleotide sequences may be 
adapted to the specific transcription and translation machinery of the host plant as known to people 
skiUed in the art. For example, silent mutations in the coding regions may be mtroduced to improve 
codon usage and specific promoters may be used to drive expression of the said genes in the relevant 
plant tissues. Promoters which are developmentally regulated or which can be induced at will, may 
25 be used to ensure expression at the appropriate time, for example, only after plant tissues have been 
harvested from the field and brought into controlled conditions. In aU tijese cases, choice of 
expression cassettes of the glycosylation modifying proteins and of tiie glycoproteins of commercial 
interest should be such tiiat tijey express in tiie same cells to allow desired post translational 
modifications to the said glycoprotein. 
30 As described above, m a specific embodiment, a plant that can be used in the method of the 

present invention is a tobacco plant, or at least a plant related to the genus Nicotiana. however, use 
for tiie invention of other relatively easy transformable plants, such as Arahidopsis thdiana or Zea 
mays, or plants related thereto, is also particularly provided. For the production of recombinant 
glycoproteins, the use of duckweed offers specific advantages. The plants are in general small and 
35 reproduce asexually through vegetative budding. Nevertheless, most duckweed species have all tiie 
tissues and organs of much larger plants including roots, stems, flowers, seeds and fronds. 
Duckweed can be grown cheaply and very fast as a firee floating plant on tiie surfiice of simple liquid 
solutions fit)m which tiiey can easily be harvested. They can also be grown on nutrient-rich waste 
water, producing valuable products while simultaneously cleaning wastewater for reuse. Particularly 
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relevant for phannaceutical applications, duckweed can be grown mdoors under contained and 
controlled conditions. Stably transformed Duckweed can for example be regenerated from tissues or 
cells after (co)-inoculating with Agrobacterium strains containing each a (binary) vector which 
comprises one or more nucleotide sequences of interest encoding N-glycosylation modifying 
5 enzymes and/or genes encoding commercially interesting heterologous glycoproteins. The 

duckweed plant may, for example, comprise the genus Spirodella, genus Wolffia, genus Wolffiella, or 
the genus Lemna, Lemna minor^ Lemna miniscula and Lemna gibba. Also mosses such as 
Physcomitrella patens offer advantages in that it can be grown cheaply under contamed conditions. 
In addition the haploid genome of Physcomitrella patens is relatively easy to manipulate. 
1 0 Expression in tomato fruits also offers specific advantages. Tomatoes can be easily grown in 

greenhouses under contained and controlled conditions and tomato fruit biomass can be harvested 
continuously throughout the year in enormous quantities. The watery fraction containing the 
glycoproteins of interest can be readily separated from the rest of the tomato fruit which allows easier 
purification of the glycoprotein. Expression in storage organs of other crops including but not 
15 limited to the kernels of com, the tubers of potato and the seeds of rape seed or sunflower are also 
attractive alternatives which provide huge biomass in organs for which harvesting and processing 
technology is in place. Expression in nectar offers specific advantages with respect to purity and 
homogeneity of tiie glycoprotein secreted in the nectar. 

Alternatively, a plant comprising a heterologous glycoprotem is crossed with a plant 
20 according to the invention comprising GnTm and optionally at least one functional mammalian 

protein, e.g., a transporter or an enzyme providing N-glycan biosynthesis that is normally not present 
in plants, harvesting progeny from said crossing and selecting a desired progeny plant expressing 
said heterologous glycoprotein and expressing GnTIII and optionally a functional (mammalian) 
enzyme involved in mammalian-like N-glycan biosynthesis that is normally not present in plants. 
25 This process is known as crosspollination. In a preferred embodiment, the invention provides a 
method according to the invention further comprising selecting a desired progeny plant expressing 
said recombinant protein comprising bisecting oligosaccharide, particularly galactose residues and/or 
increased galactosylation. Now that such a plant is provided, the invention also provides use of a 
transgenic plant to produce a desired glycoprotem or functional fragment thereof, in particular 
30 wherein said glycoprotein or functional fragment thereof comprises bisecting oligosaccharide and/or 
increased galactosylation. 

The invention additionally provides a method for obtaining a desired glycoprotein or 
functional fragment tiiereof comprising cultivating a plant according to the invention until said plant 
has reached a harvestable stage, for example when sufficient biomass has grown to allow profitable 
35 harvesting, followed by harvesting said plant with established techniques known in the art and 
fractionating said plant with established techniques known in the art to obtain fractionated plant 
material and at least partly isolating said glycoprotein from said fractionated plant material. The 
presence of desired proteins may be screened using methods known m the art, preferably using 
screening assays where the biologically active site is detected in such a way as to produce a 
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detectable signal. This signal may be produced directly or indirectly. Examples of such assays 
include ELIS A or a radioimmunoassay. 

The introduction of bisected GlcNAc residues due to expression of GnTUl can also be used 
for the prevention of removal (degradation) of saccharides from N-glycan by "blocking" activity 

5 glycosidases, e.g. , p-N-acetylhexosaminidases and preventmg the addition of other saccharides 
(driven by "other" subsequent glycosyltransferase genes) to N-linked glycan, e.g., fucosylation, 
xylosylation. By controlling localization (e.g., by providing other subcellular targetting signals) 
and/or controlling expression levels (e.g., varying levels in independent transgenic plants or using 
different promoter) glycoform composition could be modulated. Hence introduction of bisecting 

1 0 GlcNAc residues in glycoproteins in plants including recombinant glycoproteins, inhibits 

incorporation a-l,3-fucose by blockmg activity a l,3fucosyltransferase, a-l,4-fucose by blocking a- 
1,4-fucosyltransferase, p-l>xylose by blocking p-l,2-xylosyltransferase, P-U-galactose by 
blocking p-l,3-galactosyltransferase and removal/degradation of saccharides added to the N-glycan 
especially terminal GlcNAc residues by blocking activity of P-N-acetylhexosaminidases and 

15 terminal p-l,4-galactose (added by expression of p-l,4-galactosyltransferase as provided by patent 
application WO 01/31045) by blocking p-l,4galactosidase. Thus in this way, controlled expression 
of GnTIII and controlled introduction of bisecting GlcNAc residues can be used to steer glycoform 
composition and/or limit glycoform heterogeneity. 

20 Modified GnTm andGnTHI Hybrid proteins 

The invention is fUrttier directed to an isolated hybrid protein comprising a catalytic portion 
of mammalian GnTIII and a transmembrane portion of a protein, said protein residmg in 
endoplasmic reticulum or Golgi apparatus of a eukaryotic cell. The invention is also directed to a 
modified mammalian GnTIII, wherein the transmembrane domain is removed but comprising a 
25 retention signal such as KDEL for retention of said QnTUl in the ER. 

A nucleic acid sequence encoding a hybrid enzyme comprising a transmembrane portion of a 
first enzyme and a catalytic portion of a second enzyme may be obtained as follows. The sequence 
encoding the transmembrane portion is removed from the second enzyme, leaving a nucleic acid 
sequence comprising a nucleic acid sequence encoding the C-terminal portion of the second enzyme, 
30 which encompasses the catalytic site. The sequence encoding the transmembrane portion of the first 
enzyme is isolated or obtained via PGR and ligated to the sequence encoding a sequence comprising 
the C-terminal portion of the second enzyme. 

A nucleic acid sequence encoding a protein, particularly enzymes such as 
galactosyltransferases, mannosidases and N-acetylglucosamine transferases that are retained in the 
35 ER may be obtained by removing the sequence encoding the transmembrane fragment and 

substituting it for a methionine (initiation of translation) codon and by inserting between the last 
codon and the stop codon of galactosyltransferase the nucleic acid sequence encoding an ER 
retention signal such as the sequence encoding KDEL (amino acid residue sequence: lysine-aspartic 
acid-glutamic acid-leucine) (Rothman, 1987). 
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Besides controlling expression, relocalization of GnTin activity may also be controled by 
making a fusion of the gene sequence coding for the enzymatic part of GnTIU with a transmembrane 
domain of other glycosyltransferases or enzymes/proteins residing in the endoplasmic reticulum (ER) 
or Golgi apparatus membrane, or by adding so-called retention signal such as but not limited to 
KDEL for retention in the ER. Such relocalization modulates the addition of specific saccharides to 
the N-linked glycan of glycoproteins including recombinant glycoprotein and the prevention of 
removal of these. 

The exchange of transmembrane domain of GnTin with that of, for example, GnTI 
(TmGnTI), mannosidase U (TmManll) xylosyltransferase (TmXyl) or a-1,3 fucosyltransferase 
(TmFuc) but not limited to these, enables earlier e3q)ression of GnTffl and introduction of bisecting 
GlcNAc at positions 20 to 22 in Figure 2. This prevents tiie action of subsequent 
glycosyltransferases such as xylosyltransferase and fiicosyltransferase to act on the substrate leading 
to glycoforms lacking Xyl and Fuc. Importantly, the additional of terminal galactose by the action of 
P-l,4)-gaIactosyltransferase (GalT) is not inhibited by the bisecting GlcNAc- Co-expression of GalT 
(Bakker, et al, "Galactose-extended glycans of antibodies produced by transgenic plants" Proc. Nat. 
Acad. Sci. USA 98:2899-2904, 2001) results in structures similar as indicated to the right of the 
arrows annotated with 20, 21 and 22 in Figure 2. Although devoid of immunogenic xylose and 
fiicose residues, these structures have only one arm processed to complex type glycans. To allow 
conversion of also tiie other arm, in addition to relocating GnTOI, also Mannosidase H (Manll) and 
GnTn are relocated in the Gol^ to act earlier in tiie glycan processing sequence. This can be 
established in several ways. For example, by exchanging tiieir respective transmembrane domains 
by that of GnTI (TrnGnlT), which results in relocation to position indicated 5 in Figure 2. 
Alternatively, botii Mamll and GnTII can be relocalised to tiie ER by removing the transmembrane 
Golgi targeting domain and supplying die remaining enzyme fragments with a C-teminal ER 
retention signal (e.g., the amino acid residues KDEL). A plant expressing GalT (Bakker, et al., 
"Galactose-extended glycans of antibodies produced by transgenic plants" Proc. Nat. Acad. Sci. USA 
98:2899-2904, 2001) as well as the relocated versions of GnTIII {e.g., TmXyl-GnTHI), Manll (e.g., 
TmGnTI-ManU) and GnTII (e.g., TmGnTI-GnTII) can tiian be crossed witii plants expressing the 
recombinant glycoprotein of interest (Figure 3) or can be retransformed with tiie gene encoding tiie 
glycoprotein of interest such as the genes encoding an antibody. This allows tiie production of 
recombinant glycoproteins having bisected glycans with terminal galactose residues which are 
devoid of xylose and fucose. Transformation procedures and crossing (co-pollination) procedures 
are described above. 

In anotiier embodiment, GnTIII with transmembrane domain of Mannosidase II(TmManII- 
GnTin) or xylosyltransferase (Tmxyl-GnTIII) combined with TmXyl-GalT, TmGnTI-GnTII, 
TmGnTI-Manll. This combination could eitiier be obtained by coexpression or by combining 
tiirough cross-pollination of the genes involved and leads to glycoproteins including recombinant 
glycoproteins, lacking xylose and fucose on tiie core sequence but having bisected GlcNAc residues 
on the trimaimosyl core and terminal galactose. 
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EXAMPLES 

The effect of the introduction of GnUII in plants on the occuirence of bisected 
oligosaccharides on the glycans of plant glycoproteins has been evaluated. The human gene for 
GnTHI has been cloned; and a C-terminal c-myc tag for analysis of expression of the tagged fusion 
protein has been provided and the whole has been placed under control of plant regulatory elements 
for introduction in tobacco. It is shown that GnTIII is expressed in plants and that expression results 
in bisected oligosaccharide structures on endogenous plant glycoproteins. The amount of N-glycans 
containing at least two GlcNAc residues more than doubled compared to those found m normal 
tobacco plants. Remarkably, the expression of GnTffl also resulted in a significant reduction of 
complex type N-glycan degradation products as apparent from matrix-assisted laser desorption 
ionization time-of-flight (MALDI-TOF) analyses of the isolated glycans of endogenous plant 
glycoproteins. These data suggest that expression of GnTIII in tobacco resulting in the introduction 
of bisected structures on N-glycans protects the glycans from degradation by P-N- 
acetylhexosaminidases. (3-N-acetylhexosaminidases have broad specificity for non-reducing terminal 
GlcNAc and p-N-acetylglucosamine (GalNAC) cleaving amongst others GlcNAc-pi-2 linkages 
typically present on the trimannosyl core (Man-a-1-3 and Man-a-1-6). 

Example 1 

Plasmids and plant transformation. PAC clone RP5-1 104E15 GnTffl (SEQ ID NO: 1, 
Figure 4A) was obtained from Pieter J. de Jong, Children's Hospital Oakland Research Institute 
(CHORI) and is available on request through Sanger Center being part of clone set HBRC_1 .sc. The 
clone originates from Homo sapiens, male, blood and can be requested through 
http://www.sanger.ac.uk/Teams/Team63/CloneRequest/ (from Human chromosome 22ql2.3-13.1; 
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, 
CBIO ISA, UK; www.sanger.ac.uk) 

The human gene for GnTIII was cloned from said PAC clone by PCR using AccuTaq LA 
DNA polymerase (SigmaAldrich) and primers GNT3F (5'atactcgagttaacaatgaagatgagacgct-3'; SEQ 
ID NO: 3) and GNT3Rmyc (5'-tatggatcctaattcagatcctcttctgagatgag-3*; SEQ ID NO: 4). Oligos were 
from Eurogentec (Belgium). PCR was performed on a PerkinElmerCetus 480 thermal cycler 
(ABI/PE) using optimal conditions for the AccuTaq polymerase according to the manufacturer. The 
resulting fragment was cloned in EcoRV site of pBluescribe SK+ (Stratagene, Inc., La Jolla, CA 
USA) and sequence verified. Sequencmg was performed using fluorescently labelled 
dideoxynucleotides essentially as described (Sanger, et al, *T)NA sequencing with the dideoxy 
chain-terminating inhibitors'* Proc, Nat Acad. Scl USA 74:5463-5467, 1977) and reaction mixtures 
were run on an Applied Biosystems 370A or 380 automated DNA sequencer. Data were analysed 
using different software modules freely available on the web and compared with the DNA sequence 
of human GnTIII present in the database. 

A 1 .6 kb Hpal/BamHI fragment containing the GnTIII gene witfi C-terminal c-myc tag was 
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subsequently cloned into the Sma/BglH site of pUCAP35S (Van Engelen, et al, "Coordinate 
expression of antibody subunit genes yields high levels of functional antibodies in roots of transgenic 
tobacco" Plant Molecular Biology 26:\m-\m, 1994) and named pAMV-GnTHI. The cauliflower 
mosaic virus 35S (CaMV35S) 20 promoter expression cassette with modified GnTHI gene was 
subsequenUy cloned as a AscI/PacI fragment in the binary vector pBINPLUS (Van Engelen. et al, 
"Coordinate expression of antibody subunit genes yields high levels of functional antibodies in roots 
of transgenic tobacco" P/flH/Mo/eaJflr5iotogy 26:1701-1710, 1994) resulting in pBI^^»L^^ 
and introduced in Agrobacterium tumefaciens strain AglO by electroporation. Transformation of 
Nicotiam tabacum variety Samsun NN was as described before (Horsch, et ai, "A simple and 
general method fortransferring genes into plants" &fe«ce 227:1229-1231, 1985). Sixteen 
independent transgenic plants were selected and grown to maturity in the greenhouse as described. 
Leaf material was analysed for expression of GnTUI (SEQ ID NO: 2, Figure 4B) and glycan 
composition of endogenous cellular glycoproteins. 



Example 2 

Analysis of expression. Total protein extracts of tobacco leaves were prepared as described 
before (Bakker, et al., "Galactose-extended glycans of antibodies produced by transgenic plants" 
Proc. Nat. Acad. Sci. USA 98-2899-2904, 2001). The amount of protein present in samples was 
estimated by the Bradford method (Bradford, MM., "A rapid and sensitive method for the 
quantitation of microgram quantities of protein utilizing the principle of protein-dye binding" Ami 
Biochem 72:248-254, 1976) using bovine serum albumin as standard. Fixed amounts o/protein 
samples were run on precast 10 or 12 % SDS-PAGE gels (Bio-Rad) under reduced conditions. 
Rainbow coloured molecular weight protein markers were from Amersham. Western blot analysis 
was performed essentially as described (Bakker, et al, "Galactose-extended glycans of antibodies 
produced by transgenic plants" Proc. iVflr./<cad.&/. USA 98:2899-2904,2001). Separated proteins 
were transferred to nitrocellulose (BA85, Schleicher and Schuell or Trans-Blot Transfer Medium, 
Bio-Rad) using a Bio-Rad Mini Trans-blot Electrophoretic Transfer Cell in 3[cyclohexylamino]-l- 
propanesulfonic acid (CAPS) buffer for 60 min. Expression of the GnTffl-c-myc fusion protein was 
analysed by afGnoblotting using a peroxidase labelled c-myc antibody. Introduction of bisecting 
oligosaccharides in endogenous tobacco glycoproteins was visualized by incubation with biotinylated 
eryHiroagglutinatingphytohemagglutinin (E-PHA; Vector Laboratories). Detection was performed 
by enhanced chemiluminescence using Lumi-Light Western Blotting Substrate from Roche (Roche 
Diagnostics GmbH, Mannheim, Germany) on a Lumi-Imager F 1 apparatus (Boehringer Mannheim 
GmbH, Mannheim, Germany) using LumiAnalyst software (version 3.0). 



Example 3 

Matrix-assisted Laser Desorption Ionization Time-of-Flight (MALDI-TOF) Mass 
Spectrometry. For the analysis of glycan structure cellular proteins were isolated from tobacco 
leaves of a selected plant transformed with human GnTIII (GnTin-17). Protein isolation and N- 
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glycan preparation were performed as described (Elbers, et.ai, 2001). The N-glycans were desalted 
on a nonporous, graphitized carbon-black column (Carbograph Ultra Clean Tubes. AUtech 
Associates) before mass spectrometry analysis as described. MALDI-TOF spectra were measured on 
a Micromass (Manchester, U.K.) Tof spec E MALDI-TOF mass spectrometer. Mass spectra were 
performed in positive mode by using 2,5-dihydroxybenzoic acid as the matrix essentiaUy as 
described (Elbers, e/ fl/., 2001). 

Expression of human GnTDI introduces bisecting N-glycans on endogenous glycoproteins in 
N. tdbacwn. Human GnlTO was introduced in tobacco plants by ^gw6flcter/Mi»-mediated 
transformation of binary vector pBINPLUSGnTm containing a cDNA harbouring the complete 
coding sequence fused to a C-teminal c-myc tag under control of the constitutive CaMV35S 
promoter. Sixty independent transgenic shoots selected for kanamycin resistance were obtained 
which were analysed for expression of the GnTin-c-myc fusion protein using the c-myc antibody. 
Analysis revealed that all expressed the gene at various levels. Fourteen were selected, rooted and 
transferred to the greenhouse. One plant (GnTni-17) selected for high expression of the GnTUI-c- 
myc fusion protein using the c-myc antibody was analysed for the occurrence of bisected GlcNAc 
residues on N-glycans of endogenous tobacco glycoproteins using a specific binding assay with the 
biotinylated lectin E-PHA. SDS-PAGE of protein extracts followed by transfer to nitrocellulose and 
analysis using tfie specific binding assay witii the biotinylated E-PHA lectin revealed that 
endogenous tobacco glycoproteins of GnTin-17 contained bisected oHgosaccharides whereas those 
of control tobacco did not. GnTm-17 was multiplied in the greenhouse for fiirther detailed analysis 
of glycan structure by MALDI-TOF. 

Example 4 

Oligosaccharide distributions and level of bisected complex oligosaccharides in 
wildtype and selected transgenic GnTIH-l? tobacco plant. Endogenous glycoproteins were 
isolated from young leaves of control tobacco plant and the selected GnTffl-17 plant to investigate in 
detail the effect of expression of human GnTIH on the structure of glycans N-linked to glycoproteins. 
A comparison of the structures of the N-glycans isolated from glycoproteins present in leaves of 
confrol wild-type tobacco plants with those from plant GnTin-17 using MALDI-TOF is represented 
in Figure 1 . MALDI-TOF allows for the detection of different molecular species in the pool of the 
N-glycans (glycofoims) and shows a mixture of ions that were assigned to (M+Na)+ adducts of high- 
mannose (Man)- type N-glycans ranging from d, Mans to n, Man9 and of mature N-glycans from the 
truncated structure a, XM3GN2 to m, GN3FXM3GN2 (for structure see Table 1; for a summaiy of 
the data see. Table 2). In addition to the N-glycans characterized in the control plants (Figure 1 A), 
the MALDI-TOF MS of the glycan mixture from plant GnTIII-17 (Figure IB) showed at least two 
ions assigned to N-linked glycans that result from the action of the human GnTIII enzyme (for a 
comparison see Table 1 and Table 2). These oligosaccharides, GN3XM3GN2 (i) and 
GN3FXM3GN2 (k) representing 8 % and 31 % respectively of the population, contam three 
GlcNAc residues each linked to one of the three mannoses of the trimannosyl core structure of the 
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N-linked glycan. 

Analysis of glycan structure through MALDI-TOF as performed here cannot distinguish 
between GlcNAc residues p(l,2)- or p(l,4)-linked to mannose. Hence, it was not clear if or to what 
extent the structures GN2XM3GN2 and GN2FXM3GN2 have bisecting oligosaccharides. 
AdditionaTexperiments are required to reveal that these structures are a mix of normal and bisected 
oligosaccharides or a single compound. 

In the light of the observed lethality of CHO cell that overejqpress GnTIII (Umana, et ai, 
■•^gineered glycoforms of an antineuroblastoma IgGl with optimized antibody-dependent cellular 
cytotoxic activity" Nature Biotechnology 17:176-180, 1999), remarkably transgenic plants having 
significant amounts of bisected glycans lookphenotypically normal and are completely fertile (can 
be cross-pollinated and self-pollinated). 

Example S 

Expression of human GnTin in tobacco seems to protect N-glycans from degradation 
by D-Nacetylhexosaminidases and more than doubles terminal N-glucosaminylation. MALDI- 
TOF analysis of extracts clearly showed that at least 40 % of the population of glycoforms now has a 
bisecting GlcNAc in complex-type N-linked glycans of cellular tobacco proteins througji the action 
of the GnXm enzyme. Moreover 70 % of the population of complex-type N-linked glycans of 
endogenous glycoproteins of GnTlII-n has two or three terminal GlcNAc residues compared to 
about 30 % for wild^ tobacco (Table 1). The observed de novo synthesis of at least 40 % bisected 
complex-type N-linked glycans vtpon expression of GnTIII in tobacco (Figure IB.Table 1 and Table 
2) coincides with the disappearance of mainly FXM3GN2 (b, from 30 % to 4 %) and GNFXM3GN2 
(f, from 10 to 2%) and to a minor degree GN2FXM3GN2 (j, from 29 % to 19 %). In addition it also 
coincides with a significant increase in GN2XM3GN2 (h) from 4 % in wildtype tobacco to 14 % in 
GnTIII-17. Whether the latter GN2XM3GN2 (h) in GnTm plants has the second GlcNAc linked to 
the (P-linked mannose of the trimannosyl core of the N-linked glycan and hence is the result of 
GnTIII activity, or to the second a-linked mannose of the trimannosyl core remains to be 
investigated (see above). 

Saccharides a, b and c accounting for 40 % of tiie N-linked glycans in wildtype tobacco 
plants, are degradation products expected to have arisen from mature glycans of endogenous tobacco 
glycoproteins after GnTI activity since an Arabidopsis thaliana mutant lacking GnTI activity did not 
contain xylose and fiicose residues in the N-glycans of endogenous glycoproteins (Von Schaewen, et 
al., "Isolation of a mutant Arabidopsis plant tiiat lacks N-Acetylglucosaminyl transferase I and is 
unable to synthesize golgi-modified complex N-linked glycans" Plant Physiology 102:11 09-1 1 1 8, 
1993). The 7-fold decrease (40 % > 6 %) in these structures in GnTIII-17 together with the threefold 
reduction of GNXM3GN2 and XM3GN2 (12 % > 4 %) suggests that the introduction of a bisected 
GlcNAc protects the mature N-linked glycan from degradation by endogenous glycosidases 
especially p-Nacetylhexosaminidases that removes terminal GlcNAc. The total amount of N-linked 
glycans expected to have arisen from degradation of mature, ftill-length N-linked glycans has hence 
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decreased fivefold (from 52 % to 10 %). 
Example 6 

Vector construction and DNA preparation for maize transformation. The human 
GNTIII gene along with its 3' c-myc immunodeteetion tag was obtained by PGR from plasmid 
pAMV-GNTHI by the following method. Primers MS20 and MS19 homologous to the 5' and 3' ends 
of the hGNTin gene respectively, were designed and synthesi2Bd to add a Pmel site and a stop 
codon to the 3' end of the gene. 

MS20 (5'jyfcoisite): 5'-CCATGGTGATGAGACGCTAC-3' (SEQIDNO: 5) 

MS19 (adds stop and Pmel site 3-): 5'-GTTTAAACCTAGGATCCTAATTCAGATCCTCT-3' (SEQ 

roN0:6) 

Following gel electrophoresis to identify the correct sized PGR product, the 1 .6 kbp PGR 
product was recovered from the gel with a QIAquick Gel Extraction Kit (Qiagen, Valencia, CA). 
Plasmid 4005 (see, SEQ ID NO: 8) (Figures 5A and 5B), which contains a Zmubi /GUS /per5 
cassette (Christensen, et aL, Plant Molec. Biol. 18:675-689, 1992), was digested with Ncol and 
Pmel to release the GUS gene and the vector fragment was recovered from a gel with a QIAquick 
Gel Extraction Kit (Qiagen, Valencia, CA). 

FoUowing digestion with Ncol and Pmel, the PCR-derived hGNTIH fragment was ligated to 
the vector fragment left after digestion of pDAB4005 with Ncol and Pmel, to create the intermediate 
plasmid pDAB71 19 (see, SEQ ID NO: 9) (Figures 6A and 6B). Intenmediate plasmid pDAB71 19 
was cut with Spel and Sphl to release the hGNTIII plant expression cassette, which was treated with 
T4 DNA polymerase to create blunt ends. 

Plasmid pDAB8504 (SEQ ID NO: 10) (see. Figures 7A and 7B), which contains the RB7 
MAR sequences, was digested with Srfl and blunt ended with T4 DNA polymerase. Following 
treatment with calf intestinal phosphatase, the treated 8504 fragment and die hGNTIII plant 
expression cassette were ligated to create plasmid pDAB 7113 (SEQ ID NO: 10) (see. Figures 8A 
and 8B), which contains RB7 MAR sequences flanking the gene of interest and the selectable marker 
cassete as follows: RB7 MAR // Zmubi promoter/hGNTni/per5 3'UTO. // Rice actin promoter (D. 
McElroy, etal, "Isolation of an efiBcient actin promoter for use in rice transformation" The Plant 
Cell 2:163-171, 1990) /PAT/ Zm lipase 3'UTR // RB7 MAR. 

The integrity of the GNTIII sequence was checked by sequencing (Big Dye Terminator 
Cycle Sequencing Ready Reaction, Applied Biosystems, Foster City, CA) and was confirmed to 
encode tiie human GNTDI enzyme. One base change, G384 384, was found but this substitution 
does not affect the encoded amino acid, proline 128. 

Plasmid pDAB71 13 was grown up in 2 L of medium (LB + amp) and purified with Qiagen 
plasmid Giga kit to produce 5 milligrams of purified plasmid for plant cell transformation. 
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Example 7 

Transformation of maize cells. Plasmid pDAB71 13 was introduced into maize cells with 
WfflSKERS-mediated DNA transfer essentially as described in these citations, and as follows 
(Frame, B., et al. , "Production of fertile transgenic maize plants by silicon carbide whisker-mediated 
transforation" Plant J. 6:941-948, 1994; Thompson, J., e/o/., "Maize transformation utilizing 
siUcon carbide whiskers: a review" Euphytica 85:75-80, 1995; P. Song, C. Q. Cai, M. Skokut. B. 
Kosegi, and J. Petolino, "Quantitative real-time PCR as a screening tool fijr estimating transgene 
copy nimiber in Whiskers-derived transgenic maize" Plant Cell Rep. 20:948-954, 2002; both of 
which are incorporated herein by referwice). 

Embryogenic maize suspension cell cultures were subcultured on medium G-N6 (N6 medium 
containing 30 gm/L sucrose. 100 mg^. inositol, and 2 mg/L 2,4-D) the day before whisker mediated 
transformation. On the day of the experiment, cells were pretreated with osmoticum by shaking with 
medium G-N6 containing 0.2 Molar each mannitol and sorbitol for 30 minutes. Thirty six mis of 
cells were transferred to a 250 ml centrifuge bottle in 50 ml of medium G-N6, to which was added 
8.1 ml of a 5 % (w/v) silicon carbide whiskers suspension (Silar SC-9, Advanced Composite 
Materials, Greer, S.C.) in medium, plus 170 ul of 1 mg/ml plasmid solution (in TE buffer). The 
centrifuge bottle containing cells, whiskers and DNA was agitated vigorously on a modified Red 
Devil brand paint mixer for 10 seconds. Whiskered cells were then shaken for two hours in medium 
with half the level of added osmoticum. Whiskered ceUs were recovered by filtration on a sterile 
Buchner fimnel and the filter papers were placed on semisolid G-N6 medium for 1 week. After 1 
week the filters were moved to semisoUd G-N6 medium containing 1 mg/L Herbiace (a commercial 
formulation of 20% bialaphos, Meiji Seika, Tokyo, Japan). Two weeks later, the cells were removed 
from the filter paper, mixed with melted G-N6 + 1 mg/L Herbiace (G-N6 +1H) medium also 
containing 7 gm/L Seaplaque agarose (BioWhittaker, Rockland, Maine), and spread on top of G-N6 
+ IH solid medium. Plates were cultured in the dark at 30 °C. Colonies resistant to the selective 
agent were recovered 5-7 weeks post embedding, and mdividually moved to fiesh G-N6+1H medium 
for fiirther increase of tissue mass. 



Example 8 

Molecular analysis for copy number of inserted DNA. Tissue from each transgenic isolate 
was individuaUy freeze-dried in a lyophilizer and DNA was extracted by a standard method 
(DNAeasy 96 Plant Kit, Qiagen). The copy number of inserted transgenic DNA was estimated by 
the Invader Operating System, available from Third Wave Technologies (Third Wave Technologies, 
Madison, Wisconsin, twt.com). Primers were designed by the Third Wave Technologies company 
specifically for the PAT selectable marker and its copy number was estimated relative to genomic 
DNA copy number for the endogenous maize alpha-tubulin gene. 

Example 9 

Test transgenic maize callus for altered lectin binding due to expression of the Gntm 
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gene. Callus samples from 100 individually isolated unique transgenic events were extracted as 
follows. Samples from each event were fresh frozen in 96-well cluster tube boxes (Costar 1 1 ml 
polypropylene, with lid) along with a steel and a tungsten bead in each well 450 ul of extraction 
buflfer (25 mM sodium phosphate pH6.6, 100 mM NaCl, 30 mM sodium bisulfate, 1 % v/v Triton X- 
5 1 00) was added per well and the box of samples was pulverized for 3 minutes fall speed on a Kleco 
Bead Mill. The plate was centrifaged (4 °C) at 2500 rpm for 10 minutes. Extracts were removed to a 
96-well deep well plate and frozen for storage. All screening assays were performed on these extracts 
of individual events. 

Protein analyses (microliter plate protocol, BioRad 500-0006) were made to determine the 

10 total protein for each extract 25 ug protein per sample were loaded in 20 ul loading buffer (Laemmli, 
U.K. Nature 277:680 (1970)). Gels (4-20 % Criterion PAGE gels, 12+2 wells, BioRad 345-0032) 
were electrophorcsed at 65 mA in Tris/glycine /SDS running buflfer (BioRad 161-077). After soaking 
in transfer buffer (running buffer plus 20 % v/v methanol) for 10 minutes, the gels were transferred 
to nitrocellulose membranes using a semi-dry blotter (150 mA/1.5 hrs). The membranes were 

15 incubated for 30 minutes in blocking buffer (20 mM Tris, 144 mM NaCl, 0.5 % v/v Tween 20, 10 % 
w/v nonfat powdered milk) at room temperature, then the blocking buffer was remoyed and replaced 
with tiie primary detection lectm (Phaseolus hemagglutinin E, biotinylated. Vector Laboratories B- 
1 125) 2.5 ug/ml in blocking buflfer. The primary detection lectin was mcubated on the membrane for 
1 hour at room temperature. The primary detection solution was removed, the membrane was rinsed 

20 once with blocking buflfer and the secondary detection solution was added (avidin-HRP, BioRad 1 70- 
6528, at 1:5000, plus molecular weight marker detection agent, StrepTactin-HRP, BioRad 161-0380 
at 1 :10,000 in blocking buflfer. The secondary detection reagent was incubated on the membrane for 
1 hour at room temperature. During the blocking, primary, and secondary reagent steps the solutions 
were mixed on the blots. The secondary detection reagent was then removed and the membrane was 

25 rinsed with Tris buffered saline (20 mM Tris, 144 mM NaCI) containing 0.5 % Tween 20 three times 
at 10 minutes each and once more for 5 minutes. After dripping off the excess rinse solution, the blot 
was soaked in substrate ECL (Pierce 34080) for 1 minute, excess ECL solution was drained off, and 
the membrane was exposed to film. Negative controls were included in each gel to discriminate new 
glycoprotein bands now visible with this bisecting glycan -detecting lectin reagent on the transgenic 

30 callus extracts. 

Positive test results (Table 5) for the E-PHA binding were rated as 0 (negative), 1 (one plus, 
weak) 2 (two pluses, moderately strong) or 3 (three pluses, strongest rating). Callus of events rated 2 
or 3 were selected to produce sample for mass analysis. Samples 25, 26, 33, 48, 55, 56 and 59 were 
pooled to produce the protein extract for MALDI-TOF analysis of glycan substructures. A gel blot 
35 example (Figure 12) shows samples 1 9 through 27. 

Example 10 

Test transgenic maize callus for c-myc epitope expression. Callus samples fiom 100 
mdividually isolated unique transgenic events were extracted as follows. Samples from each event 
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were fresh frozen in 96-well cluster tube boxes (Costar 1.2 ml polypropylene, with lid) along with a 
steel and a tungsten bead in each well. 450 ul of extraction buffer (25 mM sodium phosphate pH6.6, 
1 00 mM NaCl, 30 mM sodium bisulfate, 1 % v/v Triton X-1 00) was added per well and the box of 
samples was pulverized for 3 minutes full speed on a Kleco Bead Mill. The plate was centrifoged (4 
°C) at 2500 rpm for 10 minutes. Extracts were ranoved to a 96-well deep well plate and frozen for 
storage. All screening assays were performed on these extracts of individual events. 

Protein analyses (microtiter plate protocol, BioRad 500-0006) were made to detemine the 
total protein for each extract. 25 ug protein per sample were loaded in 20 ul loading buffer (Laemmli, 
UJK.. Nature 277:680, 1970)). Gels (4-20 % Criterion PAGE gels, 12+2 wells per gel, BioRad 345- 
0032) were electrophoresed at 65 mA in Trisjglycine /SDS running buffer (BioRad 161-0772). 
After soaking in transfer buffer (running buffer plus 20 % methanol) for 10 minutes, the gels were 
transferred to nitrocellulose membranes using a semi-dry blotter (150 mA/1.5 hrs). The membranes 
were incubated for 30 minutes in blocking buffer (20 mM Tris, 144 mM NaCl 0.5 % v/v Tween 20, 
10 % w/v dry milk) at room temperature, then the blocking buffer was removed and replaced with 
the primary detection reagent, Mouse anti-c-myc clone 9E10 (sigma M5546) at 1 ug/ml in blocking 
buffer. After 1 hour of incubation at room temperature, the primary detection reagent was removed 
and the membrane was rinsed with blocking buffer. The secondary detection reagent, anti-mouse - 
HRP (BioRad 170-6516) at 1:10,000 plus a molecular weight marker detection reagent (StrepTactin - 
HRP, BioRad 161-0380) at 1:10,000 in blcking buffer, was ften added and incubated on the 
membrane for 1 hour at room temperature. During the blocking, primary, and secondary reagent 
steps the solutions were mbced on the blots. The secondary detection agent was removed, and the 
membrane was rinsed three times with Tris buffered saline (20 mM Tris, 144 mM NaCl) containing 
0.5 % Tween 20 for 10 minutes each, plus another 5 minute rinse. After draining off the excess rinse 
solution the membrane was soaked in ECL reagent (Pierce 34080) for 1 minute, drained, and then 
exposed to film. 

As detailed above, callus samples from independent events 1-100 were screened for 
expression of the c-myc epitope. Then, samples 3, 1 1, 12, 26, 31, 55 and 64 were analysed and 
showed the presence of a band in the predicted molecular weight range of 50-55 kilodaltons. These 
callus samples were pooled to produce a protein sample for glycan analysis by MALDI-TOF. A 
representative blot is shown in Figure 13. 

Example 11 

Preparation of extract for mass spec analysis of glycans. The samples were prepared from 
combined calluses of several maize callus events which tested positive for GnTIII transgene 
expression based on lectin blottmg using E-PHA. Callus tissue was collected fresh and stored firozen 
at -80 °C, then ground to a fine powder in liquid nitrogen. Weighed sample was added to extraction 
buffer (5 mM EDTA, 0.5 mM PMSF, 20 mM sodium bisulfite, 150 mM sodium phosphate buffer pH 
7.4, and 0.4 mM PVPP soluble MW 40,000) and stirred for 30 minutes at 4 °C. After centrifugation 
at 5000 X G at 4 **C, the supernatant was collected. Ammonium sulfate and wash buffer (5 mM 
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EDTA, 150 mM sodium phosphate buffer, pH 7.4) were added to the supernatant to achieve a final 
concentration of 20 % (w/v) ammonium sulfate. After centrifugation 5 minutes at 5000 x G at 4 ^C, 
the supernatant was transferred to a fresh tube and additional ammonium sulfate plus wash buffer 
were added to achieve 60 % (w/v) anmionium sulfate. This preparation was stirred overnight at 4 ^C, 
then centrifuged20 mmutes at 10,000 x G . The pellet was recovered in 5 ml of wash buffer and 
frozen at -80 then lyophilized at 4 °C until dry. Samples were sent to the lab for glycan analysis. 



Example 12 

Maize plant regeneration from transgenic callus tissue. For plant regeneration from 
transfomed callus, tissue was placed onto regeneration media containing MS basal salts and 
vitamins (Murashige T. and F. Skoog , Physiol Plant 15:473-497, 1962), 30 g/1 sucrose, 5 mg/1 6- 
benzylaminopurine (BA), 0.025 mg/1 2, 4-dichlorophenoxyacetic acid (2,4-D), 1 mg/1 Herbiace (a 
commercial formulation of 20 % bialaphos, Meiji Seika, Tokyo, Japan), and 2.5 g/1 Gelrite, pH 5.7. 
Cultures were grown in the light. When shoots reached 1-3 cm m length, they were transferred into 
vessels containing SH basal salts and vitamins (Schenk R. and A.C. Hildebrandt, CanJBot 50:199- 
204, 1972), 10 g/1 sucrose, 1 g/1 myo-inositol, and 2.5 g/1 Gelrite, pH 5.8. 

Plants were screened for e?qpression of GNTffl by altered binding of the lectm E-PHA to 
endogenous proteins. Samples were then screened for E-PHA binding as described in Example 9, 
5wprfl. The protein extract and 20 % / 60 % anmionium sulfate precipitate was prepared exactly as for 
the callus samples as described in Example 13, infra. One plant each from plants regenerated from 
23 independent events were screened by lectin blotting for die results of expression of the GNTIU 
gene. Four of these events gave positive signals for E-PHA binding. These four events had also 
tested positive at the callus stage. Plants regenerated from those four events were pooled to produce a 
protein extract for glycan analysis by MALDI-TOF. 

Example 13 

Oligosaccharide distributions and level of bisected complex oligosaccharides in wildtype 
and selected transgenic corn calli. Endogenous glycoproteins were isolated from control com calli 
and selected com calli expressing GnTUI based on lectin blotting usmg E-PHA. In addition, the 
present invention also contemplates the extraction of c-myc tagged samples. E-PHA and c-myc 
tagged samples may be callus, plant cells, plant tissues or entire plants as defined in the defmitions 
section supra, A comparison of the structures of the N-glycans isolated from glycoproteins present 
in calli is presented in Figures 9A and 9B. MALDI-TOF allowed for the detection of different 
molecular species in the pool of the N-glycans (glycoforms) and showed a mixture of ions that were 
assigned to (M+Na)+ adducts of high-mannose (Man)- type N-glycans ranging from d, Man5 to I, 
Man8 and of mature N-glycans from the truncated structure a, XM3GN2 to m, bGN3FXM3GN2 
(see. Table 3). In addition to the N-glycans characterized in the control callus (Figure 9A), the 
MALDI-TOF MS of the glycan mixture fix>m selected com calli expressing GnTIII (Figure 9B) 
showed at least one ion assigned to N-linked glycans that result from the action of the human GnTIII 
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enzyme (for a comparison see. Table 3). This oligosaccharide, GN3XM3GN2 (m) represents 20 % of 
the population and contains three GlcNAc residues each linked to one of the three mannoses of the 
trimannosylcore structure of theN-linlced glycan. Analysis of glycan structure through MALDI-TOF 
as performed here cannot distinguish between GlcNAc residues (5(1,2)- or p(l,4>linked to mannose. 
Hence it is not clear if or to what extent the structures GN2XM3GN2 (h) and GN2FXM3GN2 (k) 
have bisecting oligosaccharides. Both had increased numbers in GnTffl com cells con^ared to 
untransformed control com ceUs. Additional experiments are required to nsveal that these structures 
are a mfac of nonnal and bisected oligosaccharides or a single compound. 

Besides the new appearance of saccharide structure m (bGN3FXM3GN2) in GnTIII com, it 
is apparent from the comparison of flie glycoforms of control and GnTffl com, as shown in Table 3, 
that the amount of structures harbouring high-mannose type N-glycans (M4 and higher) is reduced 
more than twofold (from 19 % to 7 %) which can be attibuted mostly to the reduction of M4- 
containing N-glycans (from 10 % to 1 % of total) in GnTffl com versus control com. In addition the 
amount of glycoforms having two or more GlcNAc residues has increased from 16 % to 42 % 

(control versus GnTIII). 

In a follow-up experiment, endogenous glycoproteins were isolated from control com calli 
and selected com calli expressing GnTOI based on analysis for the presence of c-myc tag sequence 
by Westem blotting. A comparison of the stmctures of the N-glycans isolated from glycoproteins 
present in calli is presented in Table 4. MALDI-TOF allows for the detection of different molecular 
q)ecies in the pool of the N-glycans (glycoforms) and shows a mixture of ions that were assigned to 
(M+Na)+ adducts of hi^-mannose (Man)- type N-glycans ranging from d, Man5 to 1. Man8 and of 
mature N-glycans from the truncated structure a, XM3GN2 to k, GN2FXM3GN2 in control com. 

Remarkably, in transgenic com expressing GnTffl (Table 4, GnTlII-2), only three isoforms 
could be detected: FXM3GN2 (b; accountmg for 9 % of total). GN2FXM3GN2/bGN2FXM3GN2 
(k; 38 %) and bGN3FXM3GN2 (m; 54 %). It is not clear if or to what extent the stmcture depicted 
as k (GN2FXM3GN2^GN2FXM3GN2) has bisecting oligosaccharides. Its presence is significantly 
increased in GnTffl com compared to control com. Additional experiments are required to reveal 
that these structures are a mix of normal and bisected oligosaccharides or a single compound. 

Besides the new appearance of saccharide stmcture m (bGN3FXM3GN2) in GnTffl com (54 
%), it is apparent from the comparison of the glycoforms of control and GnTffl com, as summarized 
in Table 4, that the amount of stmctures harbouring high-mannose type N-glycans ^4 and hi^Jier) is 
reduced to nil m GnTffl com versus control com. Furthermore, tha total amount of N-glycans 
bearing 2 or more (3) GlcNAc residues has increased from 1 6 to 92% (control versus GnTIII) 
suggesting that the introduction of bisected GlcNAc residue protects the glycan from degradation by 
endogenous hexosaminidases as observed before for transgenic GnTIII tobacco. 

Additioanlly, MALDI-TOF mass spectroscopy data (Figure 1 1) demonstrate the bisected 
GlcNAc stmcture. 
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Example 14 

Oligosaccharide distributions and level of bisected complex oligosaccharides in wildtype 
and selected transgenic corn plants. Endogenous glycoproteins were isolated from control com 
plant leaves and selected com plant leaves expressing GnTin based on analysis for the presence of c- 
myc tag sequence by Westem blotting or lectin blotting using E^PHA. A comparison of the structures 
of the N-glycans isolated from glycoproteins present in leaves is presented m Table 6. MALDI-TOF 
allows for the detection of diffisrent molecular species in the pool of the N-giycans (glycoforms) and 
shows a mixture of ions that were assigned to (M+Na)+ or (M+K)+ adducts of high-maraiose (N4an> 
type N-glycans ranging from f, Man6 to h, Man8 and of mature N-glycans from the truncated 
structure a, XM3GN2 to g, GN2FXM3C5N2 in control com plant and to i, bC3N3FXM3GN2 in 

transgenic GnTm com plant. 

In addition to the N-glycans characterized in the control plants (Figure A), the MALDI-TOF 
MS of the glycan mixture from selected com plant expressing GnTIII (Figure B) showed at least one 
ion assigned to N-linked glycans that result from the action of the human GnTm enzyme (for a 
comparison see Table 6). This oligosaccharide. GN3XM3GN2 (i) represents 15% of the population 
and contains three GlcNAc residues each linked to one of the three mannoses of the trimannosyl core 
structure of the N-linked glycan. Analysis of glycan stracture through MALDI-TOF as performed 
here cannot distinguish between GlcNAc residues P(1.2)- or P(l,4)-linked to mannose. Hence it is 
not clear if or to what extent the stracture GN2FXM3GN2 (g) has bisecting oligosaccharides. It has 
increased in GnTIH com compared to control com plant (23 % versus 5 % in control). Additional 
experiments are required to reveal that these stractures are a mix of normal and bisected 
oligosaccharides or a single compound. Besides this it is apparent from the comparison of the 
glycoforms of control and GnTHI com plants, as depicted in Table 6, that the amount of structures 
harbouring FXM3GN2 is reduced twofold (from 59 to 30) and the amount of glycoforais having two 
or more GlcNAc residues has increased from 5 to 38 % (control versus GnTHI). 

Additionally, Figure 14 shows a comparison of MALDI-TOF mass spectra of N-glycans of 
glycoproteins isolated from leaves of control com (A) and of selected GnTHI-com plants. GnTIII 
com plant was obtained through transformation with human GnTIH gpne sequence and selection was 
performed by Westem blotting using either c-myc tag or E-PHA lectin. See Table 6 for stractures 
and abbreviations. 



It is understood that the present invention is not limited to the particular methodology, 
protocols, cell lines, vectors, and reagent, etc., described herein, as these may vary. It is also to b 
understood that the terminology used herein is used for the purpose of describing particular 
embodiments only, and is not intend to limit the scope of the present invention. It must be noted 
as used herein and in the appended claims, the singular forms "a "an" and "the" include plural 
reference imless the context clearly dictates otherwise. 

Unless defined otherwise, all technical and scientific terms used herein have the same 
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meanings as commonly understood by one of ordinary skill in the art to which this invention belongs. 

The invention described and claimed herein is not to be limited in scope by the specific 
embodiments herein disclosed, since these embodiments are intended as illustrations of several 
aspects of the invention. Any equivalent embodiments are intended to be within the scope of this 
invention. Indeed, various modifications of the invention in addition to those shown and described 
herein will become apparent to those skilled in the art from the foregoing description. Such 
modifications are also intended to fidl within the scope of the appended claims. 

Various references are cited herein, the disclosures of which are incorporated by reference in 
their entireties. 
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TaWe 1. Structure, molecular weight and percentage of total pool of N-glycans isolated from coi 



and selected GnTIII-17 plants. 
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5 Table 2. Comparison of the results of mass spec (MALDI-TOF) analysis of N-glycans of endogenous 
glycoproteins isolated from control tobacco and selected GnTIII-17 plant See also Table 1. 
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Table 3. Overview N-glycans observed in control and transgenic GnTHI corn. 
Comparison of N-glycan structures (% of total) found on endogenous glycoproteins of control, 
untransformed com and transgenic com callus expressing GnTOI that could be annotated. 
Corresponding mass spectra obtained through MALDI-TOF analyses are given below and 
5 saccharides are indicated under column "name." Bisecting GlcNAc residues are depicted as bGN. 
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36 


XM4GN2 


1227 


c 


6 


1 


M5GN2 


1257 


d 


1 


1 


GNFXM3GN2 


1414 


e 




1. 


M6GN2 


1419 


f 


5 


4 


GNXM4GN2 


1 Ain 




3 




GN2XM3GN2 


1471 


h 




3 


bGN2XM3GN2 


GNFXM4GN2 


1576 


i 


1 




M7GN2 


1581 


j 


1 


1 


GN2FXM3GN2 


1617 


k 


16 


19 


bGN2FXM3GN2 




M8GN2 


1743 


1 


1 




bGN3FXM3GN2 


1820 


m 




20 
















Total 


84 


93 
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Table 4. Schematical overview N-glycans observed in control and transgenic GnTIII corn-2. 

Comparison of N-glycan structures (% of total) found on endogenous glycoproteins of control, 
untransfomied com and transgenic com callus expressing GnTIII that could be annotated. 
Transgenic com was selected using c-myc tag. Corresponding mass spectra obtained through 
5 MALDI-TOF analyses are given below and saccharides are indicated under column "name". 
Bisecting GlcNAc residues are depicted as bON. 



Structure abbreviation 


m/z 


name 


Corn callus 




control 


GnTIII-2 












XM3GN2 


1065 


a 


1 




FXM3GN2 


1211 


b 


37 


9 


XM4GN2 


1227 


c 


6 




M5GN2 


1257 


d 


1 




GNFXM3GN2 


1414 


e 


12 




M6GN2 


1419 


c 

I 


5 




GNXM4GN2 


1430 


g 


3 




GN2XM3GN2 


1471 


h 






bGN2XM3GN2 


GNFXM4GN2 


1576 


i 


1 




M7GN2 


1581 


j 


1 




GN2FXM3GN2 


1617 


k 


16 


38 


bGN2FXM3GN2 




M8GN2 


1743 


1 


1 




bGN3FXM3GN2 


1820 


m 




54 
















Total 


84 


101 
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Table 5. Positive test results for E-PHA binding. 





Rating for E-PHA 


Included In Pooled 


ID number 


Dinoiny 


nofiitive sanoDle 


1. 


Unclear 




2. 


Unclear 




3. 


1 




4. 


0 




5. 


0 




6. 


0 




7. 


0 




8. 


0 




9. 


0 




10. 


Unclear 




11. 


Unclear 




12. 


1 


- 


13. 


1 




14. 


0 




15. 


0 




16. 


0 




17. 


0 




18. 


0 




19. 


0 




20. 


0 


-- 


21. 


0 




22. 


0 




23. 


1 




24. 


1 




25. 


3 


Yes 


26. 


3 


Yes 


27. 


0 




28. 


0 




29. 


0 




30. 


0 




31. 


0 




32. 


0 




33. 


2 


Yes 


34. 


1 




35. 


0 




36. 


0 




37. 


Unclear 




38. 


1 




39. 


0 




40. 


1 




41. 


1 




42. 


0 




43. 


0 




44. 


Unclear 




45. 


0 




46. 


0 
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47. 


U 




48. 


o 


1 CO 


49. 


u 




50. 


U 




51. 


U 




52. 


U 




53. 


r\ 
u 




54. 


1 




55. 


I, 


T 


56. 


2 


Voe 
T6o 


57. 


2 


Voc 

Tes 


58. 


1 




59. 


2 


yes 


60. 


0 




61. 


0 




62. 


0 




DO. 


0 




64. 


0 




65. 


0 




66. 


0 




67. 


0 




68. 


0 




69. 


Negative control 





Table 6. Schematical overview N-glycans observed in control and transgenic GnTDI corn 
plants. 



structure abbreviation 


m/z 


name 


Com plant 


control 


GnTm 












XM3GN2 


1065 


a 


4 


14 


FM3GN2 


1079 


b 


2 




FXM3GN2 


1211 


c 


59 


30 


XM4GN2 


1227 


d 


3 


12 


GNFXM3GN2 


1414 


e 


10 


6 


M6GN2 


1419 


f 


2 




GN2FXM3GN2 


1617 


g 


5 


23 


bGN2FXM3GN2 




M8GN2 


1743 


h 


1 




bGN3FXM3GN2 


1820 


i 




15 












Total 


86 


100 
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What is Claimed is: 

5 1 . A plant host cell system, comprising a mammalian UDP^N^acetylglucosamine: P-D mannoside 
p(l,4)-N-acetylglucosaminyltransferase (GnTIII) enzyme. 

2. The plant host system according to Claim 1 , wherein said GnTTn is a human GnTIIL 

10 3. The plant host system according to Claim 1, wherem said system is a portion of a plant 

4. The plant host system accordmg to Claim 1, wherem said system is a portion of a plant selected 
from the group consisting of a cell, leaf, embryo, callus, stem, pericarp, protoplast, root, tuber, 
kernel, endosperm and embryo. 

15 

5. The plant host system according to Claim 1 , wherein said system is a whole plant. 

6. The plant host system according to Claim 1 , further comprismg a heterologous glycoprotein. 

20 7. The plant host system according to Claim 5, wherein said heterologous glycoprotein protein 
comprises an antibody or fragment tibereof, 

8. The plant host system according to Claim 5, wherein said heterologous glycoprotein or 
functional fragment thereof comprises bisected oligosaccharides. 

25 

9. The plant host system according to Claim 5, wherein said heterologous glycoprotein comprises 
bisected glycans with galactose residues. 

1 0. The plant host system according to Claim 1, wherein said plant is a tobacco plant. 

30 

1 1 . The plant host system according to Claim 1 , which further comprises a functional protein 
selected from the group consistmg of a transporter and an enzyme providing N-glycan 
biosynthesis. 

35 12. The plant host system according to Claim 1 1, wherein said enzyme is a (human) p-1 ,4 
galactosyltransferase. 

13. The plant host system according to Claim 1 1, which further comprises a heterologous 
glycoprotein, having an increased number of galactose residues. 
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14. A plant host system comprising a nucleic acid sequence encoding a mammalian GnTm protein. 

15. A plant host system comprising a vector comprising a nucleic acid sequence encoding a 
5 mammalian GnTDI protein. 

16. The plant host system according to Claim 15, which further comprises a nucleic acid sequence 
encoding a functional protein selected from a group consisting of a transporter and an enzyme 
providing N-glycan biosynthesis. 

10 

17. A method comprising a) crossing a plant expressing a heterologous glycoprotein with a plant 
according to Claim 5, b) harvesting progeny from said crossing and c) selecting a desired 
progeny plant 

15 18. The method according to Claim 17, wherein said desired progeny plant expressing said 
heterologous glycoprotein protein having bisected oligosaccharides. 

19. The method according to Claim 17, wherein said plant host system is a transgenic plant 

20 20. A method for obtaining a heterologous glycoprotein havmg bisected oligosaccharides 

comprising introducing a nucleic acid sequence encoding GnTin that is normally not present in 
plant into a plant host system and a nucleic acid sequence encoding a heterologous glycoprotein 
and isolating said heterologous glycoprotein. 

25 21. The method according to Claim 20, wherem said nucleic acid sequences are introduced into a 
plant cell and said plant cell is regenerated into a plant. 

22. The method according to Claim 20, wherein said nucleic acid sequences are introduced mto a 
plant host system by transforming said plant host system with a vector comprising a acid 

30 sequence encodmg GnTHI that is normally not present in plant into a plant and a nucleic acid 

sequence encoding a heterologous glycopraxein. 

23. The method according to Claim 20, wherein said nucleic acid sequences are introduced into a 
plant host system by transforming said plant host system with a vector comprising a nucleic acid 

35 sequence encoding GnTIII that is normally not present in plant into a plant and a nucleic acid 

sequence encoding a heterologous glycoprotein. 

24. The method according to Claim 20, wherein said nucleic acid sequences are introduced into a 
plant host system by transforming said plant with a vector comprising a nucleic acid sequence 
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encoding GnTIII that is normally not present in plant into a plant host system and vector 
comprising a nucleic acid sequence encoding a heterologous glycoprotein. 

25. A method for obtaining a heterologous glycoprotein having bisected oligosaccharides 
5 comprising cultivating the regenerated plant obtained in claim 21. 

26. A method for obtaining a desired glycoprotein comprising a) cultivating flie plant host system of 
Claim 6 and b) harvesting and fractionating said plant. 

10 27. A plant obtainable by a method according to Claim 1 9. 

28. A method for obtaining a plant host system comprising a) crossing a plant comprising a 
functional protein selected from a group consisting of a transporter or an enzyme providing N- 
glycan biosynthesis with a plant according to Claim 5, b) harvestmg progeny from said crossing 

15 and c) selecting a desired progeny. 

29. A transgenic plant obtained according to the method of Claim 28. 

30. A method for increasing galactosylation of a heterologous glycoprotein expressed in a plant host 
20 system comprismg a) introducing a nucleic acid sequence encoding GnTIII and a sequence 

selected from a group consisting of a transporter or an enzyme not normally present in a plant 
into said plant host system expressmg said heterologous glycoprotein and b) isolating said 
glycoprotein. 

25 3 1 . A plant derived glycoprotein comprising bisected oligosaccharides. 

32. Use of a plant host system according to Claims 1-16 to produce a desired glycoprotein or 
functional fragment thereof. 

30 33. Use according to Claim 32 wherein a said glycoprotein or functional fragment thereof comprises 
bisected oligosaccharides. 

34. A plant-derived glycoprotein or functional fragment thereof obtained by a method according to 
Claim 19. 

35 

35. Use of a glycoprotein or functional fragment thereof according to Claim 31 for the production of 
a pharmaceutical composition. 

36. A composition comprising a glycoprotein or functional fragment thereof according to Claim 31 . 
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37. An isolated hybrid protein comprising an active site of GnTIII and a transmembrane region of a 
protein, said protein residing in endoplasmic reticulum or Golgi apparatus of a eukaryotic cell. 

S 38. The protein according to Claim 37, wherein saidprotein residing in endoplasmic reticulum or 
Golgi apparatus of a eukaryotic cell is an enzyme. 

39. The protein according to Claun 37, wherein said protein residing in endoplasmic reticulum or 
Golgi apparatus of a eukaryotic cell is a glycosyltransferase. 

10 

40. The protein according to Claim 37, wherein said protein residing in endoplasmic reticulum or 
Golgi apparatus of a eukaryotic cell is a glycosyltransferase selected from &e group consisting 
of a maimosidasel, mannosidasell, GnTI, GnTII, XylT and FucT, 

15 41. The protein according to Claim 37, wherein said protein residing in endoplasmic reticulum or 
Golgi apparatus of a eukaryotic cell is a plant protein. 

42. An isolated nucleic acid sequence encoding the protein of Claim 31. 

20 43. A vector comprising the isolated nucleic acid sequence of Claim 42. 

44. A plant comprising the isolated nucleic acid sequence of Claim 42. 

45. The plant according to Claim 44 which further comprises a nucleic acid sequence encoding a 
25 heterologous glycoprotein. 

46. A method for providing a transgenic plant capable of expressing a heterologous glycoprotein 
with the capacity to extend an N-linked glycan with galactose comprising crossing a transgenic 
plant with a plant according to Claim 44, harvesting progeny from said crossing and selecting a 

30 desired progeny plant expressing said recombinant protein and expressing a functional 

(manmialian) enzyme involved in (mammalian) N-glycan biosynthesis that is normally not 
present in plants. 

47. A method for providing a transgenic plant capable of expressing a heterologous glycoprotein 
35 with the capacity to extend an N-linked glycan with galactose comprising introducing the 

nucleic acid sequence of Claim 42 and a nucleic acid sequence encoding said heterologous 
glycoprotein. 

48. A method, comprising: 



wo 03/078614 PCT/IB03/01562 

-64- 

a. providing: i) a plant cell, and ii) an expression vector comprising nucleic acid encoding a 

GNTIII enzyme; and 

b, introducing said expression vector into said plant cell under conditions such that said enzyme 

is e)qpressed. 

5 

49. The method of Claim 22, wherein said nucleic acid encoding a GNTm comprises the nucleic 
acid sequence of SEQ ID NO: 1 . 



50. A method, comprising: 

10 a. providing: i) a plant cell, ii) a first expression vector comprising nucleic acid encoding a 

GNTIII enzyme, and iii) a second e^qpression vector comprising nucleic acid encoding a 
heterologous glycoprotein; and 
b. introducmg said first and second expression vectors into said plant cell under conditions such 
that said hybrid enzyme and said heterologous protein are expressed. 

15 

51. The method of Claim 50, Miierein said heterologous protein is an antibody or antibody fragment. 



52. A method, comprising: 

a) providing: i) a first plant comprismg a first expression vector, said first vector comprising 
20 nucleic acid encoding a GNTIII enzyme, and ii) a second plant comprising a second 

expression vector, said second vector comprising nucleic acid encoding a heterologous 
protein; and 

b) crossing said first plant and said second plant to produce progeny expressing said hybrid 

enzyme and said heterologous protein. 

25 

53. A plant, comprising first and second expression vectors, said first vector comprising nucleic acid 
encoding a GNTIII enzyme, said second vector comprising nucleic acid encoding a heterologous 
protein. 



30 54. The plant of Claim 53, v^herein said heterologous protein is an antibody or antibody fragment. 
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FIG. 1A 
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1837.0868 




ab c 



SUBSTITUTE SHEET (RULE 26) 



wo 03/078614 



2/41 



PCT/IB03/01S62 



M9 

4x ManI^0-0 

Xil. 



M5 
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FIG. 2 
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Nucleotide sequence (incl. myc-tag) 

rrATnQT GATGAGACGCTACAAGCTCTTTCTCATGTTCTGTAT GGCCQGCCT^ . 

TGCCTCATGTCCTTCCTGOVCTTCITCAAGACCCTGTCCTATGTCA 

CGAGAACTGGCCTCCCTa^GCCCTAAC CTGGTGTCCAGCTTTTTCTGG^ 

TGCCCCGGTCACGCGCCAGGCCAGCCCCGAGCCAGGAGGCCCTGACCTGCT^ 

gGTACCCCACTCTACTCCCACTCGCCCCTGOTQCAGCCGCTGCCG^ 

GGCGGCCGAGQAGCTCCACCGGGTGGAOTTGGTGCTGCCCGAGGACACCACG 

GAGTATTTGQTQCGCT^CCAAGGCCGQCGGCGTCTGCTTCAAACCCGGCACCA 

AGATGCTGGAGAGGCCGCCCCCGGGACGGCCGGAGGAGAAGCCTGAGGGGG 

CCAACGGCTCCTCGGCCCGGCGGCCACCCCGGTACCTCCTGAGCGCCCGGGA 

GCGCACGGGGGGCCGAGGCGCCCGGCGCAAGTGGGTGGAGTGCGTGTGCCT 

GCCCGGCTGGCACGGACCCAGCTGCGGCGTGCCCACTGTGGTGCAGTACTCC 

AACCTGCCCACCAAGGAGCGGCTGGTGCCCAGGGAGGTGCCGCGCCGCGTCA 

TCAACGCCATCAACGTCAACCACGAGTTCGACCTGCTGGACGTGCGCTTCCA 

CQAGCTGGGGGAGQTGGTGQACGCCTTT GTGGTGTGCGAGTCCAACTTCACG 

GCTTATGGGGAGCCGCGGCCGCT(^QTT CCGGGAGATGCTGACCAATGGCA 

CCTTCGAGTAC^TCCGCGACAAGGTQCTCTATGTCTTCCTGGA CCAC^ 

CCCGGCGGCCGGCAGGACGGCTGGATCGGCGACGACTACCTGCGCACCTTCC 

TCACCCAGGACGGCGTCTCGCGGCTGCGCAACCTGCGGCCCGACGACGTCTT 

CATCATTGACGATGCGGACGAGATCCCGGCCCGTGACGGCGTCCTTTTCCTCA 

AGCTCTACGATGGCTGGACCGAGCCCTTCGCCTTCCACATGCGCAAGTCGCTC 

TACGGCTTCTTCTGGAAGCAGCCGGGCACCCTGQAGGTGGTGTCAGGCTGCA 

CGGTGGACATGCTGCAGGCAGTGTATGGGCTGGACGGCATCC GCCTGCGCCG 

CCGCCAGTACTACACCATGCCCAACTTCAGACAGTATGAGAACCGCACCGGC 

CACATCCTGGTGCAGTGGTCGCTGGGCAGCCCCCTGCACTTCG CCGGCTGGC 

ACTGCTCCTGGTGCTTCACGCCCGAGG GCATCTACrrTCAAQCTCGTGTCCGCC 

CAGAATGGCGAC!TTCCC:a.CGOTGGGGT GACTACGAGGACAAGCGG^ 

ACTACATCCGCGGCCTGATCCGCACCGGGGGCTGGTTCGACG GCACGCAGCA 

GGAGTACGCGCCTGCAGACCCCAGCGAGCACATGTATGCGCCCAAGTACCTG 

CTQAAGAACTACGACCGGTTCCACTACCTGCTGGACAACCCCTACCAQQAGC 

CCAGGAGCACGGCGGCGGGCGGGTGGCGCCACAQGGGTCCCGAGGGAAGGC 

CGCCCGCCCGGGGCAAACTQGACGAGGCGGAAGTCG AACAAAAACTCATCT 

CAGAAGAGGATCTGAATTAGGATCC 

FIG, 4A 
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PROTEIN SEQUENCE 

M VMRT?YKLFI. MT7r!MAGLC!LT SFLHPFKT T.fi YVTFPREIAS LSPNLVSSFF 
WNNAPVTPOA SPBPGGPDLL RTPLYSHflPL LOP T.PPSKAA EELHRVDLVL 
PTCnTTBYFVB TKAGQVCFKP GTKMLERPP P flRPBEKPEGA NGSSARRPPR 
VT.T.fiARERTG fiPGARRKWVE CiVCLPQWHGP fiCGVPTWOY SNLPTKERLV 
PREVPRRVIN ATNVNHBFDT. LDVRFHELC iD WDAFWCES NFTAYGEPRP 
T..KFREMLTNG TFBYIRHKVL YVPLDHPPPG GROD GWIADD YLRTFLTODG 
VSRLRNLRPn DVFIIDDADE IPARDGVLPL KLYD GWTEPF AFHMRKSLYG 
FFWKOPQTT.E WSGCTVDML OAVYGLDGIR LRR ROYYTMP NPROYENRTG 
HTLVOWSLQfi PLHFAQWHCS WCFTPEG TYP KLVSAONQDF PRWGDYEDKR 
DLNYIRGLTT? TGQWFDGTOO EYPPAHPSEH MYA PKYLLKN YDRFHYLLDN 
PYOEPRSTAA GGWRHRGPEG RPPARGKLDE AEVE QKLISE EDLN 



FIG. 4B 



wo 03/078614 



6/41 



PCT/IB03/01562 




SUBSTITUTE SHEET (RULE 26) 



wo 03/078614 PCT/IB03/01562 

7/41 



g 



0) 
H 

H 

O 
U 

I 

0) 
(Q 

0) 



0) 



& & 



H 
4-) 
O 
0) 
H 

O TJ 



rsi 



o 

H ^ 

!2i m 



wo 03/078614 



8/41 



PCT/IB03/01562 




HHHHHtHHHH 

inomoinoinoLO 



H H H H H 

LO o in o m 

^ in in vo vo 

H H H H H 



wo 03/078614 



9/41 



PCT/IB03/01562 




HHHHHHHHH 

oinoinoinoino 



HHHHHHHHi-i 

LOoinoinoinoLn 



HHHHHHHHHHHHH 

oLooinoinoiooinoino 

VOIOC^C^CDOOO^O^OOHHCN 



wo 03/078614 



10/41 



PCT/IB03/01562 




wo 03/078614 



11/41 



PCT/IB03/01S62 




O 

u 

o 
E 



SSSSSSSKSSSinoinoinoinoinoinoinotno 
«movmooHHC^CNfnf0^^inu>VDU)t^c^oooo<na>ooHHo?csp 
S 3 5 3 S S S-K [nSSKSininininininu>ininininu)VDU>u>iDtou> 



wo 03/078614 



12/41 



PCT/IB03/01562 



u u 

CD O 

a o 




icd EH E-» 

U o 

m 
m 

egg 

S H rf: u 

O O CD CD 

a cj CD 
CD i«< u 

rHHHHHHHi-IHHi-IHHH 

inomotnoinoinoinoino 



O 

u 

PQ 

2 



wo 03/078614 



13/41 



PCT/IB03/01562 




SUBSTITUTE SHEET (RULE 26) 



wo 03/078614 



14/41 



PCT/IB03/01562 



S^AB7119 6818bp 







1-38 Linker* 


CCTvprUAoAlCuCCGGCKaATCCTCTAGAGTCGACCTG 


39-2028 


-Maize Ubiquitin 1 promoter 


2029-2047 


GGGTACCCCCGGGGTCGAC 


T. "J tn V"ja T* 




2048-3695 


GNT III V.2 plus TAGGTTT 


2462 


C to replace 6 as reported in original 
sequence 


3696-3699 Pmel 


AAAC 


3700-4064 


Maize Per 5 3 'UTR 


4065-6818 


pUC19 backbone 


4259-4264 


TGCGCA Fspl 


4720-5580 


Ampicillin Resistance gene 


5282-5287 


TGCGCA Fspl 



BamHI 



1 CCTGCAGATC CCCGGGGATC CTCTAGAGTC GACCTGCAGT 

GCAGGGTGAC CCGGTCGTGC CCCTCTCTAG AGATAATGAG 

CATTGCATGT CTAAGTTATA 
101 AAAAATTACC ACATATTTTT TTTGTCACAC TTGTTTGAAG 

TGCAGTTTAT CTATCTTTAT ACATATATTT AAACTTTAAT 

CTACGAATAA TATAATCTAT 
201 AGTACTACAA TAATATCAGT GTTTTAGAGA ATCATATAAA 

TGAACAGTTA GACATGGTCT AAAGGACAAT TGAGTATTTT 

GACAACAGGA CTCTACAGTT 
301 TTATCTTTTT AGTGTGCATG TGTTCTCCTT TTTTTTTGCA 

AATAGCTTCA CCTATATAAT ACTTCATCCA TTTTATTAGT 

ACATCCATTT AGGGTTTAGG 
401 GTTAATGGTT TTTATAGACT AATTTTTTTA GTACATCTAT 

TTTATTCTAT TTTAGCCTCT AAATTAAGAA AACTAAAACT 

CTATTTTAGT TTTTTTATTT 
501 AATAATTTAG ATATAAAATA GAATAAAATA AAGTGACTAA 

AAATTAAACA AATACCCTTT AAGAAATTAA AAAAACTAAG 

GAAACATTTT TCTTGTTTCG 
601 AGTAGATAAT GCCAGCCTGT TAAACGCCGT CGACGAGTCT 

AACGGACACC AACCAGCGAA CCAGCAGCGT CGCGTCGGGC 

CAAGCGAAGC AGACGGCACG 
701 GCATCTCTGT CGCTGCCTCT GGACCCCTCT CGAGAGTTCC 

GCTCCACCGT TGGACTTGCT CCGCTGTCGG CATCCAGAAA 

TTGCGTGGCG GAGCGGCAGA 
801 CGTGAGCCGG CACGGCAGGC GGCCTCCTCC TCCTCTCACG 

GCACGGCAGC TACGGGGGAT TCCTTTCCCA CCGGTCCTTC 

GCTTTCCCTT CCTCGCCCGC 



FIG 6B 
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901 CGTAATAAAT AGACACCCCC TCCACACCCT CTTTCCCCAA 

CCTCGTQTTO TTCJGGAGCGC ACACACACAC AACCAGATCT 

CCCCCAAATC CACCCGTCGG 
1001 CACCTCCGCT TCAAGGTACG CCGCTCGTCC TCCCCCCCCC 

CCCCTCTCTA CCTTCTCTAG ATCGGCGTTC CGGTCCATGC 

ATGGTTAGGG CCCGGTAGTT 
1101 CTACTTCTGT TCATGTTTGT GTTAGATCCG TGTTTGTGTT 

AGATCCGTGC TGCTAGCGTT CGTACACG6A TGCGACCTGT 

ACGTCAGACA CGTTCTGATT 
1201 GCTAACTTGC CAGTGTTTCT CTTTGGGGAA TCCTGGGATG 

GCTCTAGCCG TTCCGCAGAC GGGATCGATT TCATGATTTT 

TTTTGTTTCG TTGCATAGGG 
1301 TTTGGTTTGC CCTTTTCCTT TATTTCAATA TATGCCGTGC 

ACTTGTTTGT CGGGTCATCT TTTCATGCTT TTTTTTGTCT 

TGGTTGTGAT GATGTGGTCT 
1401 GGTTGGGCGG TCGTTCTAGA TCGGAGTAGA ATTCTGTTTC 

AAACTACCTG GTGGATTTAT TAATTTTGGA TCTGTATGTG 

TGTGCCATAC ATATTCATAG 
1501 TTACGAATTG AAGATGATGG ATGGAAATAT CGATCTAGGA 

TAGGTATACA TGTTGATGCG GGTTTTACTG ATGCATATAC 

AGAGATGCTT TTTGTTCGCT 
1601 TGGTTGTGAT GATGTGGTGT GGTTGGGCGG TCGTTCATTC 

GTTCTAGATC GGAGTAGAAT ACTGTTTCAA ACTACCTGGT 

GTATTTATTA ATTTTGGAAC 
1701 TGTATGTGTG TGTCATACAT CTTCATAGTT ACGAGTTTAA 

GATGGATGGA AATATCGATC TAGGATAGGT ATACATGTTG 

ATGTGGGTTT TACTGATGCA 
1801 TATACATGAT GGCATATGCA GCATCTATTC ATATGCTCTA 

ACCTTGAGTA CCTATCTATT ATAATAAACA AGTATGTTTT 

ATAATTATTT TGATCTTGAT 
1901 ATACTTGGAT GATGGCATAT GCAGCAGCTA TATGTGGATT 

TTTTTAGCCC TGCCTTCATA CGCTATTTAT TTGCTTGGTA 

CTGTTTCTTT TGTCGATGCT 

Ncol 
Fsel 



2001 CACCCTGTTG TTTGGTGTTA CTTCTGCAGG GTACCCCCGG 

GGTCGACCAT GGTGATGAGA CGCTACAAGC TCTTTCTCAT 

GTTCTGTATG GCCGGCCTGT 
2101 GCCTCATCTC CTTCCTGCAC TTCTTCAAGA CCCTGTCCTA 

TGTCACCTTC CCCCGAGAAC TGGCCTCCCT CAGCCCTAAC 

CTGGTGTCCA GCTTTTTCTG 
2201 GAACAATGCC CCGGTCACGC CCCAGGCCAG CCCCGAGCCA 

GGAGGCCCTG ACCTGCTGCG TACCCCACTC TACTCCCACT 

CGCCCCTGCT GCAGCCGCTG 



FIG6B CONT. 
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Fspl 



2301 CCGCCCAGCA AGGCGGCCGA GGAGCTCCAC CGGGTGGACT 

TGGTGCTGCC CGAGGACACC ACCQAGTATT TCGTGCGCAC 

CAAGGCCGGC GGCGTCTGCT 
2401 TCAAACCCGG CACCAAGATG CTGGAGAGGC CCCCCCCGGG 

ACGGCCGGAG GAGAAGCCTG AGGGGGCCAA CGGCTCCTCG 

GCCCGGCGGC CACCCCGGTA 
2501 CCTCCTGAGC GCCCGGGAGC GCACGGGGGG CCGAGGCGCC 

CGGCGCAAGT GGGTGGAGTG CGTGTGCCTG CCCGGCTGGC 

ACGGACCCAG CTGCGGCGTG 
2601 CCCACTGTGG TGCAGTACTC CAACCTGCCC ACCAAGGAGC 

GGCTGGTGCC CAGGGAGGTG CCGCGCiCGCG TCATCAACGC 

CATCAACGTC AACCACGAGT 

Not I 



2701 TCGACCTGCT GGACGTGCGC TTCCACGAGC TGGGCGACGT 

GGTGGACGCC TTTGTGGTGT GCGAGTCCAA CTTCACGGCT 

TATGGGGAGC CGCGGCCGCT 
2801 CAAGTTCCGG GAGATGCTGA CCAATGGCAC CTTCGAGTAC 

ATCCGCCACA AGGTGCTCTA TGTCTTCCTG GACCACTTCC 

CGCCCGGCGG CCGGCAGGAC 

Fspl. 

Fspl 



2901 GGCTGGATCG CCGACGACTA CCTGCGCACC TTCCTCACCC 
AGGACGGCGT CTCGCGGCTG CGCAACCTGC GGCCCGACGA 
CGTCTTCATC ATTGACGATG 

Fspl 



3001 CGGACGAGAT CCCGGCCCGT GACGGCGTCC TTTTCCTCAA 

GCTCTACGAT GGCTGGACCG AGCCCTTCGC CTTCCACATG 

CGCAAGTCGC TCTACGGCTT 
3101 CTTCTGGAAG CAGCCGGGCA CCCTGGAGGT GGTGTCAGGC 

TGCACGGTGG ACATGCTGCA GGCAGTGTAT GGGCTGGACG 

GCATCCGCCT GCGCCGCCGC 
3201 CAGTACTACA CCATGCCCAA CTTCAGACAG TATGAGAACC 

GCACC6GCCA CATCCTGGTG CAGTGGTCGC TGGGCAGCCC 

CCTGCACTTC GCCGGCTGGC 
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3301 ACTGCTCCTG GTGCTTCACG CCCGAGGGCA TCTACTTCAA 

GCTCGTGTCC GCCCAQAATG GCQACTTCCC ACGCTGGGGT 

GACTACGAGG ACAAGCGGGA 
3401 CCTGAACTAC ATCCGCGGCC TGATCCGCAC CGGGGGCTGG 

TTCGACGGCA CGCS^GCAGGA GTACCCGCCT GCAGACCCCA 

GCGAGCACAT GTATGCGCCC 
3501 AAGTACCTGC TGAAGAACTA CGACCGGTTC GACTACCTGC 

TGGACAACCC CTACCAGGAG CCCAGGAGCA CGGCGGCGGG 

CG6GTGGCGC CACAGGGGTC 

BaraHI Pmel 



3601 CCGAGGGAAG GCCGCCCGCC CGGGGCAAAC TGGACGAGGC 

GGAAGTCGAA CAAAAACTCA TCTCAGAAGA GGATCTGAAT 

TAGGATCCTA GGTTTAAACT 
3701 GAGGGCACTG AAGTCGCTTG ATGTGCTGAA TTGTTTGTGA 

TGTTGGTGGC GTATTTTGTT TAAATAAGTA AGCATGGCTG 

TGATTTTATC ATATGATCGA 
3801 TCTTTGGGGT TTTATTTAAC ACATTGTAAA ATGTGTATCT 

ATTAATAACT CAATGTATAA GATGTGTTCA TTCTTCGGTT 

GCCATAGATC TGCTTATTTG 
3901 ACCTGTGATG TTTTGACTCC AAAAACCAAA ATCACAACTC 

AATAAACTCA TGGAATATGT CCACCTGTTT CTTGAAGAGT 

TCATCTACCA TTCCAGTTGG 

Fsel 



4001 CATTTATCAG TGTTGCAGCG GCGCTGTGCT TTGTAACATA 
ACAATTGTTC ACGGCATATA TCCACGGCCG GCCTAGCTAG 
CCACGGTGGC CAGATCCACT 
NotI 



4101 AGTTCTAGAG CGGCCGCTTA ATTCACTGGC CGTCGTTTTA 
CAACGTCGTG ACTGGGAAAA CCCTGGCGTT ACCCAACTTA 
ATCGCCTTGC AGCACATCCC 

Fspl 



4201 CCTTTCGCCA GCTGGCGTAA TAGCGAAGAG GCCCGCACCG 

ATCGCCCTTG CCAACAGTTG CGCAGCCTGA ATGGCGAATG 

GCGCCTGATG CGGTATTTTC 
4301 TCCTTACGCA TCTGTGCGGT ATTTCACACC GCATATGGTG 

CACTCTCAGT ACAATCTGCT CTGATGCCGC ATAGTTAAGC 

CAGCCCCGAC ACCCGCCAAC 
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4401 ACCCGCTGAC GCGCCCTGAC GGGCTTGTCT GCTCCCGGCA 

TCCGCTTACA GACAAGCTGT 6ACCGTCTCC GGGAGCTGCA 

TGTGTCAGAG GTTTTCACCG 
4501 TCATCACCGA AACGCGCGAG ACGAAAGGGC CTCGTGATAC 

GCCTATTTTT ATAGGTTAAT GTCATGATAA TAATGGTTTC 

TTAGACGTCA GGTGGCACTT 
4601 TTCGGGGAAA TGTGCGCGGA ACCCCTATTT GTTTATTTTT 

CTAAATACAT TCAAATATGT ATCCGCTCAT GAGACAATAA 

CCCTGATAAA TGCTTCAATA 
4701 ATATTGAAAA AGGAAGAGTA TGAGTATTCA ACATTTCCGT 

GTCGCCCTTA TTCCCTTTTT TGCGGCATTT TGCCTTCCTG 

TTTTTGCTCA CCCAGAAACG 
4801 CTGGTGAAAG TATU^GATGC TGAAGATCAG TTGGGTGCAC 

GAGTGGGTTA CATCGAACTG GATCTCAACA GCGGTAAGAT 

CCTTGAGAGT TTTCGCCCCG 
4901 AAGAACGTTT TCCAATGATG AGCACTTTTA AAGTTCTGCT 

ATGTGGCGCG GTATTATCCC GTATTGACGC CGGGCAAGAG 

CAACTCGGTC GCCGCATACA 
5001 CTATTCTCAG AATGACTTGG TTGAGTACTC ACCAGTCACA 

GAAl^GCATC TTACGGATGG CATGACAGTA AGAGAATTAT 

GCAGTGCTGC CATAACCATG 
5101 AGTGATAACA CTGCGGCCAA CTTACTTCTG ACAACGATCG 

GAGGACCGAA GGAGCTAACC GCTTTTTTGC ACAACATGGG 

GGATCATGTA ACTCGCCTTG 

Fspl 



5201 ATCGTTGGGA ACCGGAGCTG AATGAAGCCA TACCAAACGA 

CGAGCGTGAC ACCACGATGC CTGTAGCAAT GGCAACAACG 

TTGCGCAAAC TATTAACTGG 
5301 CGAACTACTT ACTCTAGCTT CCCGGCAACA ATTAATAGAC 

TGGATGGAGG CGGATAAAGT TGCAGGACCA CTTCTGCGCT 

CGGCCCTTCC GGCTGGCTGG 
5401 TTTATTGCTG ATAAATCTGG AGCCGGTGAG CGTGGGTCTC 

GCGGTATCAT TGCAGCACTG GGGCCAGATG GTAAGCCCTC 

CCGTATCGTA GTTATCTACA 
5501 CGACGGGGAG TCAGGCAACT ATGGATGAAC GAAATAGACA 

GATCGCTGAG ATAGGTGCCT CACTGATTAA GCATTGGTAA 

CTGTCAGACC AAGTTTACTC 
5601 ATATATACTT TAGATTGATT TAAAACTTCA TTTTTAATTT 

AAAAGGATCT AGGTGAAGAT CCTTTTTGAT AATCTCATGA 

CCAAAATCCC TTAACGTGAG 
5701 TTTTCGTTCC ACTGAGCGTC AGACCCCGTA GAAAAGATCA 

AAGGATCTTC TTGAGATCCT TTTTTTCTGC GCGTAATCTG 

CTGCTTGCAA ACAAAAAAAC 
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5801 CACCGCTACC AGCGGTGGTT TGTTTGCCG6 ATCAAGAGCT 

ACCAACTCTT TTTCCGAAGG TAACTGGCTT CAGCAGAGCG 

CAGATACCAA ATACTGTCCT 
5901 TCTAGTGTAG CCGTAGTTAG GCCACCACTT CAAGAACTCT 

GTAGCACCGC CTACATACCT CGCTCTGCTA ATCCTGTTAC 

CAGTGGCTGC TGCCAGTGGC 
6001 GATAAGTCGT GTCTTACCGG GTTGGACTCA AGACGATAGT 

TACCGGATAA GGCGCAGCGG TCGGGCTGAA CGGGGGGTTC 

GTGCACACAG CCCAGCTTGG 
6101 AGCGAACGAC CTACACCGAA CTGAGATACC TACAGCGTGA 

GCATTGAGAA AGCGCCACGC TTCCCGAAGG GAGAAAGGCG 

GACAGGTATC CGGTAAGCGG 
6201 CAGGGTCGGA ACAGGAGAGC GCACGAGGGA GCTTCCAGGG 

GGAAACGCCT GGTATCTTTA TAGTCCTGTC GGGTTTCGCC 

ACCTCTGACT TGAGCGTCGA 
63 01 TTTTTGTGAT GCTCGTCAGG GGGGCGGAGC CTATGGAAAA 

ACGCCAGCAA CGCGGCCTTT TTACGGTTCC TGGCCTTTTG 

CTGGCCTTTT GCTCACATGT 
6401 TCTTTCCTGC GTTATCCCCT GATTCTGTGG ATAACCGTAT 

TACCGCCTTT GAGTGAGCTG ATACCGCTCG CCGCAGCCGA 

ACGACCGAGC GCAGCGAGTC 
6501 AGTGAGCGAG GAAGCGGAAG AGCGCCCAAT ACGCAAACCG 

CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGC 

ACGACAGGTT TCCCGACTGG 
6601 AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 

TCACTCATTA GGCACCCCAG GCTTTACACT TTATGCTTCC 

GGCTCGTATG TTGTGTGGAA 

Not I 



6701 TTGTGAGCGG ATAACAATTT CACACAGGAA ACAGCTATGA 
CCATGATTAC GCCAAGCTAG CGGCCGCATT CCCGGGAAGC 
TAGGCCACCG TGGCCCGCCT 
Hindi II 



6801 GCAGGGGAAG CTTGCATG 
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Pmel (3651) 
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Plasmid pDAB8504 (7545 bp) 

Description: Cloning vector with Rb7 MARs (inverted 
orientation) flanking a multiple cloning Bite and the rice 
act in/PAT/ lipase selectable marker cassette 



start 


A>l Cc^ A tS w i. X UCS 

end 




X 




Tobacco Rb7 MARs 


1167 


1304 


Linker Sequence with multiple cloning 
site: 

TGGCCACCGCTTAATTAAGGCGCGCCATGCCC^^ 

in 1 TV 71 rnrp 7» 7\ •ivrrii iitv 7\ 1VT»/^n^T"nA TV TV r~1Tl 7\ 7v "jv 7\ rn^/~t TV TV 

oCCvffCT J. AA JL i AAATT Jl iyiATGTTTiuUVCTAGGAAATCCAA 


1305 


2701 


Rice actin promoter with intron 


2235 


2696 


Rice actin intron 


2702 


2703 


CC 


2704 


3258 


PAT gene (phosphinotliricin acyl 
transferase) 


3255 


3272 f 


Linker sequence: GGTACCCTGAGCTC 


3273 


3629 


Maize lipase 3' UTR . 


3630 


3670 


Linker sequence: 

GAATTCATATTTCCTCCT6CAGGGTTTAAACTTGCCGTGGC 


3671 


4836 


Tobacco Rb7 MAR (complementary) 


4837 


4857 


Linker sequence: CGGCCCACTAGTCACCGGTGT 


4858 


5103 


Pucl9 


5104 


5130 


Linker sequence : 
GCGCACGCTGCGCACGCTGCGCACGCT 


5130 


7523 


Pucl9 


7524 


7545 


Linker sequence: ACACCGGTGTGATCATGGGCCG 



Sequence 

1 CGATTAAAAA TCTCAATTAT ATTTGGTCTA ATTTAGTTTG GTATTGAGTA 

51 AAACAAATTC GAACCAAACC AAAATATAAA TATATAGTTT TTATATATAT 

101 GCCTTTAAGA CTTTTTATAG AATTTTCTTT AAAAAATATC TAGAAATATT 

151 TGCGACTCTT CTGGCATGTA ATATTTCGTT AAATATGAAG TGCTCCATTT 

201 TTATTAACTT TAAATAATTG GTTGTACGAT CACTTTCTTA TCAAGTGTTA 

251 CTAAAATGCG TCAATCTCTT TGTTCTTCCA TATTCATATG TCAAAACCTA 

3 01 TCAAAATTCT TATATATCTT TTTCGAATTT GAAGTGAAAT TTCGATAATT 

351 TAAAATTAAA TAGAACATAT CATTATTTAG GTATCATATT GATTTTTATA 

401 CTTAATTACT AAATTTGGTT AACTTTGAAA GTGTACATCA ACX5AAAAATT 

451 AGTCAAACQA CTAAAATAAA TAAATATCAT GTGTTATTAA GAAAATTCTC 

501 CTATAAGAAT ATTTTAATAG ATCATATGTT TGTAAAAAAA ATTAATTTTT 

551 ACTAACACAT ATATTTACTT ATCAAAAATT TGACAAAGTA AGATTAAAAT 

601 AATATTCATC TAACAAAAAA AAAACCAGAA AATGCTGAAA ACCCGGCAAA 

651 ACCGAACCAA TCGAAACCGA TATAGTTGGT TTGGTTT6AT TTTGATATAA 

701 ACCGAACCAA CTCGGTCCAT TTGCACCCCT AATCATAATA GCTTTAATAT 

751 TTCAAGATAT TATTAAGTTA ACGTTGTCAA TATCCTGGAA ATTTTGCAAA 

801 ATGAATCAAG CCTATATGGC TGTAATATGA ATTTAAAAGC AGCTCGATGT 

851 GGTGGTAATA TGTAATTTAC TTGATTCTAA AAAAATATCC CAAGTATTAA 
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901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 



TAATTTCTGC 
ACAATGAACC 
TTTTTAAAAA 
TTGTTCTTAA 
GTAACTATTA 
CTTCTTCTGA 
GCAAGCGGCC 
TTGGGCTGCA 
CTTTCTCGAG 
ATAAAACAAA 
GGTGGTATAA 
TACTCTTTTC 
ATACGTCATT 
TGCATATCTG 
CAGAGGGATT 
GACATAATTT 
AATAATAAGA 
TCTAGTAAAA 
CCCCTAAAGT 
CAGCCCAACC 
TAGTCTCCAC 
CGTCTCGCAG 
ACAGCAGGTG 
AGCCAGCGAC 
TCGCCACTAT 
CACCACCACC 
GCCTCCCCCC 
TCTCCTCTTT 
GCCTTGGTAG 
GTGCGCGGGA 
CCGGCCCGGA 
CGCCGTTGTT 
ACAAGATCAG 
TTTCTGCTGC 
TTGTGGGTAG 
TTCATGATTT 
ACCATGGCTT 
AGCTGATATG 
CTACAGTGAA 
GATCTAGAGA 
GGGTGTTGTG 
CTTACGATTG 
AGGTTGGGCC 
GGCGCAAGGT 
CATCTGTTAG 
CGCGCAGCTG 
AAGGGATTTT 
AGATCTGAGG 
GTTCTGGCCG 
TGTGTGTGCT 
ATCGCGTGGT 
GAGTGTTGCT 



TAGGAAGAAG 
ATAAAGTGAT 
AATACGCAAT 
ACAAGCATCC 
TGCTCCCTTC 
AAATAGTGGC 
GCTTAATTAA. 
GGTCAATCCC 
GTCATTCATA 
GGTAAGATTA 
AGTAAAATAT 
TACTATTATA 
TTTGTATGAA 
TATTTGAGTC 
TGTATAAGAA 
TTGAGAAAAA 
TTAAAATAGC 
ATAAAAGATA 
TCCTAAAGCC 
CAACCCAACC 
ACCCCCCCAC 
CCAAAAAAAA 
GGTCCGGGTC 
GAGGCCGGCC 
ATACATACCC 
ACCACCACCT 
TCCCCCTCCG 
CTTTCTCCGT 
TTTGGGTGGG 
GGGGCGGGAT 
TCTCGCGGGG 
GGGGGAGATG 
GAAGAGGGGA 
TTCGTCAGGC 
AATTTGAATC 
GTGACAAATG 
CTCCGGAGAG 
GCCGCGGTTT 
CTTTAGGACA 
GGTTGCAAGA 
GCTGGTATTG 
GACAGTTGAG 
TAGGATCCAC 
TTTAAGTCTG 
GTTGCATGAG 
GATACAAGCA 
GAGTTGCCAG 
TACCCTGAGC 
GCCGGGCCTT 
TCTGGTTTGC 
CAAGGCCCGT 
GCTTGTGTAG 
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GTTAGCTACG 

TGAAGCTCGA 

GACTTGGAAC 

CCTCTA/^GA 

GTTACAAAAA 

CACCGCTTAA 

ATTTAAATGT 

ATTGCTTTTG 

TGCTTGAGAA 

CCTGGTCAAA 

CGGTAATAAA 

AAAATTGAGG 

TTGGTTTTTA 

GGGTTTTAAG 

ATATCTTTAA 

TATATATTGA 

TTTCCCCCGT 

AACTTAGACT 

CAAAGTGCTA 

CAACCCACCC 

TATCACCGTG 

AAAAAGAAAG 

GTGGGGGCCG 

CTCCCTCCGC 

CCCCCTCTCC 

CCACCTCCTC 

CCGCCGCCGC 

TTTTTTTTTC 

CGAGAGGCGG 

CTCGCGGCTG 

AATGGGGCTC 

ATGGGGGGTT 

AAAGGGCACT 

TTAGATGTGC 

CCTCAGCATT 

CAGCCTCGTG 

GAGACCAGTT 

GTGATATCGT 

GAGCCACAAA 

TAGATACCCT 

CTTACGCTGG 

AGTACTGTTT. 

ATTGTACACA 

TGGTTGCTGT 

GCTTTGGGAT 

TGGTGGATGG 

CTCCTCCAAG 

TCGGTCGCAG 

GGGCGCGCGA 

TTTAATTTTA 

GTGCTTTAAA 

GCTTTGGTAC 



ATTTACAGCA 
AATATACGAA 
AAAAGAAAGT 
ATGGCAGTTT 
TTTTGGACTA 
TTAAGGCGCG 
TTAAACTAGG 
AAGCAGCTCA 
GAGAGTCGGG 
AGTGAAAACA 
AGGTGGCCCA 
ATGTTTTTGT 
AGTTTATTCG 
TTCGTTTGCT 
AAAAACCCAT 
GGCGAATTCT 
TGCAGCGCAT 
CAAAACATTT 
TCCACGATCC 
CAGTCCAGCC 
AGTTGTCCGC 
AAAAAAAAGA 
GAAACGCGAG 
TTCCAAAGAA 
TCCCATCCCC 
CCCCCTCGCT 
GCCGGTAACC 
CGTCTCGGTC 
CTTCGTGCGC 
GGGCTCTCGC 
TCGGATGTAG 
TAAAATTTCC 
ATGGTTTATA 
TAGATCTTTC 
GTTCATCGGT 
CGGAGCTTTT 
GAGATTAGGC 
TAACCATTAC 
CACCACAAGA 
TGGTTGGTTG 
GCCCTGGAAG 
ACGTGTCACA 
CATTTGCTTA 
TATAGGCCTT 
ACACAGCCCG 
CATGATGTTG 
GCCAGTTAGG 
CGTGTGCGTG 
TCAGAAGCGT 
CCAAGTTTGT 
GACCCACCGG 
GTATGGGCTT 
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AAGCCAGAAT 
GGAACAAATA 
GATATATTTT 
TCCTTTGCAT 
CTATTGGGAA 
CCATG6CCGG 
AAATCCAAGC 
ACATTGATCT 
ATAGTCCAAA 
TCAGTTAAAA 
AAGTGAAATT 
CGGTACTTTG 
CTTTTGGAAA 
TTTGTAAATA 
ATGCTAATTT 
CACAATGAAC 
GGGTATTTTT 
ACAAAAACAA 
ATAGCAAGCC 
AACTGGACAA 
ACGCACCGCA 
AAAAGAAAAA 
GAGGATCGCG 
ACGCCCCCCA 
CCAACCCTAC 
GCCGGACGAC 
ACCCCGCCCC 
TCGATCTTTG 
GCCCAGATCG 
CGGCGTGGAT 
ATCTGCGATC 
GCCATGCTAA 
TTTTTATATA 
TTTCTTCTTT 
AGTTTTTCTT 
TTGTAGGTAG 
CAGCTACAGC 
ATTGAGACGT 
GTGGATTGAT 
CTGAGGTTGA 
GCTAGGAACG 
TAGGCATCAA 
AGTCTATGGA 
CCAAACGATC 
GGGTACATTG 
GTTTTTGGCA 
CCAGTTACCC 
TCCGTCGTAC 
TGCGTTGGCG 
TTCAAGGTGG 
CACTGGCAGT 
TATTTGCTTC 
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4601 
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4651 
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4701 
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dooi 


L. X VJAL. i 


C A C T 


GAATCGGCCA 


5101 


GCxGCGCACG 


5151 


GCGCTCGGTC 


5201 


TAATACGGTT 


C A CI 

5251 


GuAAAAGGCC 


5301 


oi i X xxuCAX 


cr o cr n 
5351 


^A A/^rn^A/^A/^ 

Ci\AG 1 CAG AG 


c A m 
5401 




340x 


r*nr* tvt tv r^r"r/2 
UViVjAX AU v« X V9 
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^ 3^ X 


V^urV^ XV3 1 Vzr 1 V7v^ 


3 O t/ X 


TTVAPTATPHT 
lAnwl Al^V7l 


^ D ^ -L 






AV^xXiAJr aV3 1 X v« 1 


5751 


ATTTGGTATC 


5801 


GTAGCTCTTG 


5851 


GTTTGCAAGC 


5901 


TTT6ATCTTT 


5951 


AAGGGATTTT 


6001 


TTAAATTAAA 


6051 


TTGGTCTGAC 



GTACTACTTG 
AGCTGGGCTA 
CTAAAGATCT 
TTGCCGTGGC 
TGTAACGAAG 
TAGAGGGGAT 
CAAGTCATTG 
GCTTCAATCA 
GCTAACCTTC 
AATCAAGTAA 
ATTACAGCCA 
CAACGTTAAC 
GTGCAAATGG 
ACTATATCGG 
GGTTTTTTTT 
TTTGATAAGT 
TATGATCTAT 
TATTTATTTA 
AAAGTTAACC 
ATAATGATAT 
TCGAAAAAGA 
AGAACAAAGA 
TACAACCAAT 
AAATATTACA 
AAAATTCTAT 
TATTTTGGTT 
CCAAATATAA 
TTGGCGTAAT 
CACAATTCCA 
GTGCCTAATG 
GCTTTCCAGT 
ACGCGCGGGG 
CTGCGCACGC 
GTTCGGCTGC 
ATCCACAGAA 
AGCAAAAGGC 
AGGCTCCGCC 
GTGGCGAAAC 
GCTCCCTCGT 
TCCGCCTTTC 
TAGGTATCTC 
ACGAACCCCC 
CTTGAGTCCA 
TGGTAACAGG 
TGAAGTGGTG 
TGCGCTCTGC 
ATCCGGCAAA 
AGCAGATTAC 
TCTACXSGGGT 
GGTCATGAGA 
AATGAAGTTT 
AGTTACCAAT 
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GGTTTGTTGA 
CCTGGACATT 
TTAAGTGCTG 
CTATTTTCAG 
GGAGCATAAT 
GCTTGTTTAA 
CGTATTTrTT 
CTTTATGGTT 
TTCCTAGCAG 
ATTACATATT 
TATAGGCTTG 
TTAATAATAT 
ACCGAGTTGG 
TTTGGATTGG 
TTGTTAGATG 
AAATATATGT 
TAAAATATTC 
TTTTAGTCGT 
AAATTTAGTA 
GTTCTATTTA 
TATATAAGAA 
GATTGACGCA 
TATTTAAAGT 
TGCCAGAAGA 
AAAAAGTCTT 
TGGTTCGAAT 
TTGGGATTTT 
CATGGTCATA 
CACAACATAC 
AGTGAGCTAA 
CGGGAAACCT 
AGAGGCGGTT 
TGCGCACGCT 
GGCGAGCGGT 
TCAGGGGATA 
CAGGAACCGT 
CCCCTGACGA 
CCGACAGGAC 
GCGCTCTCCT 
TCCCTTCGGG 
AGTTCGGTGT 
CGTTCAGCCC 
ACCCGGTAAG 
ATTAGCAGAG 
GCCTAACTAC 
TGAAGCCAGT 
CAAACCACCG 
GCGCAGAAAA 
CTGACGCTCA 
TTATC2^AAAA 
TAAATCAATC 
GCTTAATCAG 



ATTATTATGA 
GTTATGTATT 
AATTCATATT 
AAGAAGTTCC 
AGTTACATGC 
GAACAAAAAA 
TAAAAATATT 
CTTTGTATTC 
AAATTATTAA 
ACCACCACAT 
ATTCATTTTG 
CTTGAAATAT 
TTCGGTTTAT 
TTCGGTTTTG 
AATATTATTT 
GTTAGTAAAA 
TTATAGGAGA 
TTGACTAATT 
ATTAAGTATA 
ATTTTAAATT 
TTTTGATAGA 
TTTTAGTAAC 
TAATAAAAAT 
GTCGCAAATA 
AAAGGCATAT 
TTGTTTTACT 
TAATCGCGGC 
GCTGTTTCCT 
GAGCCGGAAG 
CTCACATTAA 
GTCGTGCCAG 
TGCGTATTGG 
TCCTCGCTCA 
ATCAGCTCAC 
ACGCAGGAAA 
AAAAAGGCCG 
GCATCACAAA 
TATAAAGATA 
GTTCCGACCC 
AAGCGTGGCG 
AGGTCGTTCG 
GACCGCTGCG 
ACACGACTTA 
CGAGGTATGT 
GGCTACACTA 
TACCTTCGGA 
CTGGTAGCGG 
AAAGGATCTC 
GTGGAACGAA 
GGATCTTCAC 
TAAAGTATAT 
TGAGGCACCT 
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GCAGTTGCGT 
AATAAATGCT 
TCCTCCTGCA 
CAATAGTAGT 
AAAGGAAAAC 
TATATCACTT 
TGTTCCTTCG 
TGGCTTTGCT 
TACTTGGGAT 
CGAGCTGCTT 
CAAAATTTCC 
TAAAGCTATT 
ATCAAAATCA 
CCGGGTTTTC 
TAATCTTACT 
ATTAATTTTT 
ATTTTCTTAA 
TTTCGTTGAT 
AAAATCAATA 
ATCGAAATTT 
TTTTGACATA 
ACTTGATAAG 
GGAGCACTTC 
TTTCTAGATA 
ATATAAAAAC 
CAATACCAAA 
CCACTAGTCA 
GTGTGAAATT 
CATAAAGTGT 
TTGCGTTGCG 
CTGCATTAAT 
GCGCTCTTCC 
CTGACTCGCT 
TCAAAGGCGG 
GAACATGTGA 
CGTTGCTGGC 
AATCGACGCT 
CCAGGCGTTT 
TGCCGCTTAC 
CTTTCTCATA 
CTCCAAGCTG 
CCTTATCCGG 
TCGCCACTGG 
AGGCGGTGCT 
GAAGGACAGT 
AAAAGAGTTG 
TGGTTTTTTT 
AAGAAGATCC 
AACTCACGTT 
CTAGATCCTT 
ATGAGTAAAC 
ATCTCAGCGA 
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6101 TCTGTCTATT TCGTTCATCC ATAGTTGCCT GACTCCCCGT CGTGTAGATA 

6151 ACTACGATAC GGGAGGGCTT ACCATCTGGC CCCAGTGCTG CAATGATACC 

6201 GCGAGACCCA CGCTCACCGG CTCCAGATTT ATCAGCAATA AACCAGCCAG 

6251 CCGGAAGGGC CGAGCGCAGA AGTGGTCCTG CAACTTTATC CGCCTCCATC 

6301 CAGTCTATTA ATTGTTGCCG GGAAGCTAGA GTAAGTAGTT CGCCAGTTAA 

6351 TAGTTTGCGC AACGTTGTTG CCATTGCTAC AGGCATCGTG GTGTCACGCT 

6401 CGTCGTTTGG TAT-GGCTTCA. TTCAGCTCCG GTTCCCAACG ATCAAGGCGA 

6451 GTTACATGAT CCCCCATGTT GTGCAAAAAA GCGGTTAGCT CCTTCGGTCC 

6501 TCCGATCGTT GTCAGAAGTA AGTTGGCCGC AGTGTTATCA CTCATGGTTA 

6551 TGGCAGCACT GCATAATTCT CTTACTGTCA TGCCATCCGT AAGATGCTTT 

6601 TCTGTGACTG GTGAGTACTC AACCAAGTCA TTCTGAGAAT AGTGTATGCG 

6651 GCGACCGAGT TGCTCTTGCC CGGCGTCAAT ACGGGATAAT ACCGCGCCAC 

6701 ATAGCAGAAC TTTAAAAGTG CTCATCATTG GAAAACGTTC TTCGGGGCGA 

6751 AAACTCTCAA GGATCTTACC GCTGTTGAGA TCCAGTTCGA TGTAACCCAC 

6801 TCGTGCACCC AACTGATCTT CAGCATCTTT TACTTTCACC AGCGTTTCTG 

6851 GGTGAGCAAA AACAGGAAGG CAAAATGCCG CAAAAAAGGG AATAAGGGCG 

6901 ACACGGAAAT GTTGAATACT CATACTCTTC CTTTTTCAAT ATTATTGAAG 

6951 CATTTATCAG GGTTATTGTC TCATGAGCGG ATACATATTT GAATGTATTT 

7001 AGAAAAATAA ACAAATAGGG GTTCCGCGCA CATTTCCCCG AAAAGTGCCA 

7051 CCTGACGTCT AAGAAACCAT TATTATCATG ACATTAACCT ATAAAAATAG 

7101 GCGTATCACG AGGCCCTTTC GTCTCGCGCG TTTCGGTGAT GACGGTGAAA 

7151 ACCTCTGACA CATGCAGCTC CCGGAGACGG TCACAGCTTG TCTGTAAGCG 

7201 GATGCCGGGA GCAGACAAGC CCGTCAGGGC GCGTCAGCGG GTGTTGGCGG 

7251 GTGTCGGGGC TGGCTTAACT ATGCGGCATC AGAGCAGATT GTACTGAGAG 

7301 TGCACCATAT GCGGTGTGAA ATACCGCACA GATGCGTAAG GAGAAAATAC 

7351 CGCATCAGGC GCCATTCGCC ATTCAGGCTG CGCAACTGTT GGGMiGGGCG 

7401 ATCGGTGCGG GCCTCTTCGC TATTACGCCA GCTGGCGAAA GGGGGATGTG 

7451 CTGCAAGGCG ATTAAGTTGG GTAACGCCAG GGTTTTCCCA GTCACGACGT 

7501 TGTAl^CGA CGGCCAGTGA ATTACACCGG TGTGATCATG GGCCG 
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pDAB7ll3 11643bp 



Sequence 


Feature 


l-1164bp 


RB7 MAR v3 


1165-1233 

Linker 


TGGCCACCGCTTAATTAAGGCGCGCCATGCCCCCTGCAG 
ATCCCCGGGGATCCTCTAGAGTCGACCTGC 


1234-3224 


Maize Ubiquitin 1 promoter 


3224-4891 


GNT III V.2 


3627 


C to replace G as reported in original 
sequence 


4892-4895 
Linker 


TAGGTTT 


4896-5260 


Maize Per5 3'UTR v2 


5261-5404 
multiple 
cloning sites 


CGGCCGGCCTAGCTAGCCACGGTGGCCAGATCCACTAGG 
GGCAAGCGGCCGCTTAATTAAATTTAAATGTTTAAACTA 
GGAAATCCAAGCTTGGGCTGCAGGTCAATCCCATTGCTT 
TTGAAGCAGCTCAACATTGATCTCTTT 


5405-6802 


Rice Actinl Promoter v2 


6803-7358 


PAT v3 


7359-7372 
Linker 


GGTACCCTGAGCTC 


7373-7729 


Maize Lipase 3' UTR vl 


7730-7770 
Linker 


GAATTCATATTTCCTCCTGGAGGGTTTAAACTTGCCGTG 
GC 


7771-8934 


RB7 MAR v3 


8935-11643 


PUC19 


9201-9225 


3 Fspl sites (TGCGCAA) with CG in 
between sites 


10164-11021 


Ampicillin resistance gene 


10454-10459 


TGCGCAA Fspl 


11477-11482 


TGCGCAA Fspl 



1 CGATTAAAAA CCCAATTATA TTTGGTCTAA TTTAGTTTGG 

TATTGAGTAA AACAAATTCG AACCAAACCA AAATATAAAT 

ATATAGTTTT TATATATATG 
101 CCTTTAAGAC TTTTTATAGA ATTTTCTTTA AAAAATATCT 

AGAAATATTT GCGACTCTTC TGGCATGTAA TATTTCGTTA 

AATATGAAGT GCTCCATTTT 
201 TATTAACTTT AAATAATTGG TTGTACGATC ACTTTCTTAT 

CAAGTGTTAC TAAAATGCGT CAATCTCTTT GTTCTTCCAT 

ATTCATATGT CAAAATCTAT 
301 CAAAATTCTT ATATATCTTT TTCGAATTTG AAGTGAAATT 

TCGATAATTT AAAATTAAAT AGl^CATATC ATTATTTAGG 

TATCATATTG ATTTTTATAC 
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401 TTAATTACTA AATTTGGTTA ACTTTGAAAG TGTACATCAA 

CGAAAAATTA GTCAAACGAC TAAAATAAAT AAATATCATG 

TGTTATTAAG AAAATTCTCC 
501 TATAAGAATA TTTTAATAGA TCATATGTTT GTAAAAAAAA 

TTAATTTTTA CTAACACATA TATTTACTTA TCAAAAATTT 

GACAAA6TAA GATTAA2^TA 
601 ATATTCATCT AACAAA2^AAA AAACCAGAAA ATGCTGAAAA 

CCCGGCAAl^ CCGAACCAAT CCAAACCGAT ATAGTTGGTT 

TGGTTTGATT TTGATATAAA 
701 CCG2^CCAAC TCGGTCCATT TGCACCCCTA ATCATAATAG 

CTTTAATATT TCAAGATATT ATTAAGTTAA CGTTGTCAAT 

ATCCTGGAAA TTTTGCAAAA 
801 TGAATCAAGC CTATATGGCT GTAATATGAA TTTAT^GCA 

GCTCGATGTG GTGGTAATAT GTAATTTACT TGATTCTAAA 

AAAATATCCC AAGTATTAAT 
901 AATTTCTGCT AGGAAGAAGG TTAGCTACGA TTTACAGCAA 

AGCCAGAATA CAAAGAACCA TAAAGTGATT GAAGCTCGAA 

ATATACGAAG GAACAAATAT 
1001 TTTTAAAAAA ATACGGAATG ACTTGGAACA AAAGAAAGTG 

ATATATTTTT TGTTCTTAAA CAAGCATCCC CTCTAAAGAA 

TGGCAGTTTT CCTTTGCATG 

Pad 



AscI 



1101 TAACTATTAT GCTCCCTTCG TTACAAAAAT TTTGGACTAC 
TATTGGGAAT TCTTCTGAAA ATAGTGGCCA CCGCTTAATT 
AAGGCGCGCC ATGCCCCCTG 
BamHI 



1201 CAGATCCCCG GGGATCCTCT AGAGTCGACC TGCAGTGCAG 

CGTGACCCGG TCGTGCCCCT CTCTAGAGAT AATGAGCATT 

GCATGTCTAA GTTATAAAAA 
1301 ATTACCACAT ATTTTTTTTG TCACACTTGT TTGAAGTGCA 

GTTTATCTAT CTTTATACAT ATATTTAAAC TTTAATCTAC 

GAATAATATA ATCTATAGTA 
1401 CTACAATAAT ATCAGTGTTT TAGAGAATCA TATAAATGAA 

CAGTTAGACA TGGTCTAAAG GACAATTGAG TATTTTGACA 

ACAGGACTCT ACAGTTTTAT 
1501 CTTTTTAGTG TGCATGTGTT CTCCTTTTTT TTTGCAAATA 

GCTTCACCTA TATAATACTT CATCCATTTT ATTAGTACAT 

CCATTTAGGG TTTAGGGTTA 
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1601 ATGGTTTTTA TAGACTAATT TTTTTAGTAC ATCTATTTTA 

TTCTATTTTA GCCTCTAAAT TAAGAAAACT AAAACTCTAT 

TTTAGTTTTT TTATTTAATA 
1701 ATTTAGATAT AAAATAGAAT AAAATAAAGT GACTAAAAAT 

TAAACAAATA CCCTTTAAGA AATTAAAAAA ACTAAGGAAA 

CATTTTTCTT GTTTCGAGTA 
1801 GATAATGCCA GCCTGTTAAA CGCCGTCGAC GAGTCTAACG 

GACACCAACC AGCGAACCAG CAGCGTCGCG TCGGGCCAAG 

CGAAGCAGAC GGCACGGCAT 
1901 CTCTGTCGCT GCCTCTGGAC CCCTCTCGAG AGTTCCGCTC 

CACCGTTGGA CTTGCTCCGC TGTCGGCATC CAGAAATTGC 

GTGGCGGAGC GGCAGACGTG 
2001 AGCCGGCACG GCAGGCGGCC TCCTCCTCCT CTCACGGCAC 

GGCAGCTACG GGGGATTCCT TTCCCACCGC TCCTTCGCTT 

TCCCTTCCTC GCCCGCCGTA 
2101 ATAAATAGAC ACCCCCTCCA CACCCTCTTT CCCCAACCTC 

GTGTTGTTCG GAGCGGACAC ACACACAACC AGATCTCCCC 

CAAATCCACC CGTCGGCACC 
2201 TCCGCTTCAA GGTACGCCGC TCGTCCTCCC CCCCCCCCCC 

TCTCTACCTT CTCTAGATCG GCGTTCCGGT CCATGCATGG 

TTAGGGCCCG GTAGTTCTAC 
2301 TTCTGTTCAT GTTTGTGTTA GATCCGTGTT TGTGTTAGAT 

CCGTGCTGCT AGCGTTCGTA CACGGATGCG ACCTGTACGT 

CAGACACGTT CTGATTGCTA 
2401 ACTTGCCAGT GTTTCTCTTT GGGGAATCCT GGGATGGCTC 

TAGCCGTTCC GCAGACGGGA TCGATTTCAT GATTTTTTTT 

GTTTCGTTGC ATAGGGTTTG 
2501 GTTTGCCCTT TTCCTTTATT TCAATATATG CCGTGCACTT 

GTTTGTCGGG TCATCTTTTC ATGCTTTTTT TTGTCTTGGT 

TGTGATGATG TGGTCTGGTT 
2601 GGGCGGTCGT TCTAGATCGG AGTAGAATTC TGTTTCAAAC 

TACCTGGTGG ATTTATTAAT TTTGGATCTG TATGTGTGTG 

CCATACATAT TCATAGTTAC 
2701 GAATTGAAGA TGATGGATGG AAATATCGAT CTAGGATAGG 

TATACATGTT GATGCGGGTT TTACTGATGC ATATACAGAG 

ATGCTTTTTG TTCGGTTGGT 
2801 TGTGATGATG TGGTGTGGTT GGGCGGTCGT TCATTCGTTC 

TAGATCGGAG TAGAATACTG TTTCAAACTA CCTGGTGTAT 

TTATTAATTT TGGAACTGTA 
2901 TGTGTGTGTC ATACATCTTC ATAGTTACGA GTTTAAGATG 

GATGGAAATA TCGATCTAGG ATAGGTATAC ATGTTGATGT 

GGGTTTTACT GATGCATATA 
3001 CATGATGGCA TATGCAGCAT CTATTCATAT GCTCTAACCT 

TGAGTACCTA TCTATTATAA TAAACAAGTA TGTTTTATAA 

TTATTTTGAT CTTGATATAC 
3101 TTGGATGATG GCATATGCAG CAGCTATATG TGGATTTTTT 

TAGCCCTGCC TTCATACGCT ATTTATTTGC TTGGTACTGT 

TTCTTTTGTC GATGCTCACC 



FIG8B CONT, 



wo 03/078614 



29/41 



PCT/IB03/01562 



Ncol 
Fsel 



3201 CTGTTGTTTG GTGTTACTTC TGCAGGGTAC CCCCGGGGTC 

GACCATGGTG ATGAGACGCT ACAAGCTCTT TCTCATGTTC 

TGTATGGCCG GCCTGTGCCT 
33 01 CATCTCCTTC CTGCACTTCT TCAAGACCCT GTCCTATGTC 

ACCTTCCCCC GAGAACTGGC CTCCCTCAGC CCTAACCTGG 

TGTCCAGCTT TTTCTGGAAC 
3401 AATGCCCCGG TCACGCCCCA GGCCAGCCCC GAGCCAGGAG 

GCCCTGACCT GCTGCGTACC CCACTCTACT CCCACTCGCC 

CCTGCTGCAG CCGCTGCCGC 

Sad 



3501 CCAGCAAGGC GGCCGAGGAG CTCCACCGGG TGGACTTGGT 

GCTGCCCGAG GACACCACCG AGTATTTCGT GCGCACC2VAG 

GCCGGCGGCG TCTGCTTCAA 
3601 ACCCGGCACC AAGATGCTGG AGAGGCCCCC CCCGGGACGG 

CCGGAGGAGA AGCCTGAGGG GGCCAACGGC TCCTCGGCCC 

GGCGGCCACC CCGGTACCTC 
3701 CTGAGCGCCC GGGAGCGCAC GGGGGGCCGA GGCGCCCGGC 

GCAAGTGGGT GGAGTGCGTG TGCCTGCCCG GCTGGCACGG 

ACCCAGCTGC GGCGTGCCCA 
3801 CTGTGGTGCA GTACTCCAAC CTGCCCACCA AGGAGCGGCT 

GGTGCCCAGG GAGGTGCCGC GCCGCGTCAT CAACGCCATC 

AACGTCAACC ACGAGTTCGA 

NotI 



3901 CCTGCTGGAC GTGCGCTTCC ACGAGCTGGG CGACGTGGTG 

GACGCCTTTG TGGTGTGCGA GTCCAACTTC ACGGCTTATG 

GGGAGCCGCG GCCGCTCAAG 
4001 TTCCGGGAGA TGCTGACCAA TGGCACCTTC GAGTACATCC 

GCCACAAGGT GCTCTATGTC TTCCTGGACC ACTTCCCGCC 

CGGCGGCCGG CAGGACGGCT 
4101 GGATCGCCGA CGACTACCTG CGCACCTTCC TCACCCAGGA 

CGGCGTCTCG CGGCTGCGCA ACCTGCX3GCC CGACGACGTC 

TTCATCATTG ACGATGCGGA 
4201 CGAGATCCCG GCCCGTGACG GCGTCCTTTT CCTCAAGCTC 

TACGATGGCT GGACCGAGCC CTTCGCCTTC CACATGCGCA 

AGTCGCTCTA CGGCTTCTTC 
4301 TGGAAGCAGC CGGGCACCCT GGAGGTGGTG TCAGGCTGCA 

CGGTGGACAT GCTGCAGGCA GTGTATGGGC TGGACGGCAT 

CCGCCTGCGC CGCCGCCAGT 
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4401 ACTACACCAT GCCCAACTTC AGACAGTATG AGAACCGCAC 

CGGCCACATC CTGGTGCAGT GGTCGCTGGG CAGCCCCCTG 

CACTTCGCCG GCTGGCACT6 
4501 CTCCTGGTGC TTCACGCCCG AGGGCATCTA CTTCAAGCTC 

GTGTCCGCCC AGAATGGCGA CTTCCCACGC TGGGGTGACT 

ACGAGGACAA GCGGGACCTG 
4601 AACTACATCC GCGGCCTGAT CCGCACCGGG GGCTGGTTCG 

ACGGCACGCA GCAGGAGTAC CCGCCTGCAG ACCCCAGCGA 

GCACATGTAT GCGCCCAAGT 
4701 ACCTGCTGAA GAACTACGAC CGGTTCCACT ACCTGCTGGA 

CAACCCCTAC CAGGAGCCCA GGAGCACGGC GGCGGGCGGG 

TGGCGCCACA GGGGTCCCGA 

BaitiHI Pmel 



4801 GGGAAGGCCG CCCGCCCGGG 6CAAACTGGA CGAGGCGGAA 

GTCGAACAAA AACTCATCTC AGAAGAGGAT CTGAATTAGG 

ATCCTAGGTT TAAACTGAGG 
4901 GCACTGAAGT CGCTTGATGT GCTGAATTGT TTGTGATGTT 

GGTGGCGTAT TTTGTTTAAA TAAGTAAGCA TGGCTGTGAT 

TTTATCATAT GATCGATCTT 
5001 TGGGGTTTTA TTTAACACAT TGTAAAATGT GTATCTATTA 

ATAACTCAAT GTATAAGATG TGTTCATTCT TCGGTTGCCA 

TAGATCTGCT TATTTGACCT 
5101 GTGATGTTTT GACTCCAAAA ACCAAAATCA CAACTCAATA 

AACTCATGGA ATATGTCCAC CTGTTTCTTG AAGAGTTCAT 

CTACCATTCC AGTTGGCATT 

Fsel 



5201 TATCAGTGTT GCAGCGGCGC TGTGCTTTGT AACATAACAA 
TTGTTCACGG CATATATCCA CGGCCGGCCT AGCTAGCCAC 
GGTGGCCAGA TCCACTAGGG 
Pad 



Swal 



Not I Pmel 
Hindi I I 



5301 GCAAGCGGCC GCTTAATTAA ATTTAAATGT TTAAACTAGG 
AAATCCAAGC TTGGGCTGCA GGTCAATCCC ATTGCTTTTG 
AAGCAGCTCA AGATTGATCT 
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5401 CTTTCTCGAG GTCATTCATA TGCTTGAGAA GAGAGTCGGG 

ATAGTCCAAA ATAAAACAAA GGTAAGATTA CCTGGTCAAA 

AGTGAAAACA TCAGTTAAAA 
5501 GGTGGTATAA AGTAAAATAT CGGTAATAAA AGGTGGCCCA 

AAGTGAAATT TACTCTTTTC TACTATTATA AAAATTGAGG 

ATGTTTTTGT CX3GTACTTTG 
5601 ATACGTCATT TTTGTATGA?^ TTGGTTTTTA AGTTTATTC6 

CTTTTGGAAA TGCATATCTG TATTTGAGTC GGGTTTTAAG 

TTCGTTTGCT TTTGTA2ATA 
5701 CAGAGGGATT TGTATAAGAA ATATCTTTAA AAAAACCCAT 

ATGCTAATTT GACATAATTT TTGAGAAAAA TATATATTCA 

GGCGAATTCT CACAATGAAC 
5801 AATAATAAGA TTAAAATAGC TTTCCCCCGT TGCAGCGCAT 

GGGTATTTTT TCTAGTAAAA ATAAZ^GATA AACTTAGACT 

CAAAACATTT ACAAAAACAA 
5901 CCCCTAAAGT TCCTAAAGCC CAAAGTGCTA TCCACGATCC 

ATAGCAAGCC CAGCCCAACC CAACCCAACC CAACCCACCC 

CAGTCCAGCC AACTGGACAA 
6001 TAGTCTCCAC ACCCCCCCAC TATCACCGTG AGTTGTCCGC 

ACGCACCGCA CGTCTCGCAG CCAAAAAAAA AT^^GAAAG 

AAAAAAAAGA AAAAGAAA2^ 

Fsel 



6101 ACAGCAGGTG GGTCCGGGTC GTGGGGGCCG GAAACGCGAG 

GAGGATCGCG AGCCAGCGAC GAGGCCGGCC CTCCCTCCGC 

TTCCAAAGAA ACGCCCCCCA 
6201 TCGCCACTAT ATACATACCC CCCCCTCTCC TCCCATCCCC 

CCAACCCTAC CACCACCACC ACCACCACCT CCACCTCCTC 

CCCCCTCGCT GCCGGACGAC 
6301 GCCTCCCCCC TCCCCCTCCG CCGCCGCCGC GCCGGTAACC 

ACCCCGCCCC TCTCCTCTTT CTTTCTCCGT TTTTTTTTTC 

CGTCTCGGTC TCGATCTTTG 

BamHI 



6401 GCCTTGGTAG TTTGGGTGGG CGAGAGGCGG CTTCGTGCGC 
GCCCAGATCG GTGCGCGGGA GGGGCGGGAT CTCGCGGCTG 
GGGCTCTCGC CGGCGTGGAT 
BamHI 

6501 CCGGCCCGGA TCTCGCGGGG AATGGGGCTC TCGGATGTAG 
ATCTGCGATC CGCCGTTGTT GGGGGAGATG ATGGGGGGTT 
TAAAATTTCC GCCATGCTAA 
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6601 ACAAGATCAG GAAGAGGGGA AAAGGGCACT ATGGTTTATA , 

TTTTTATATA TTTCTGCTGC TTCGTCAGGC TTAGATGTGC 

TAGATCTTTC TTTCTTCTTT 
6701 TTGTGGGTAG AATTTGAATC CCTCAGCATT GTTCATCGGT 

AGTTTTTCTT TTCATGATTT GTGACAAATG CAGCCTCGTG 

CGGAGCTTTT TTGTAGGTAG 
NCOI 

6801 ACCATGGCTT CTCCGGAGAG GAGACCAGTT GAGATTAGGC 

CAGCTACAGC AGCTOATATG GCCGCGGTTT GTGATATCGT 

TAACCATTAC ATTGAGACGT 
6901 CTACAGTGAA CTTTAGGACA GAGCCACAAA CACCACAAGA 

GTGGATTGAT GATCTAGAGA GGTTGCAAGA TAGATACCCT 

TGGTTGGTTG CTGAGGTTGA 
7001 GGGTGTTGTG GCTGGTATTG CTTACGCTGG GCCCTGGAAG 

GCTAGGAACG CTTACGATTG GACAGTTGAG AGTACTGTTT 

ACGTGTCACA TAGGCATCAA 
BamHI 



7101 AGGTTGGGCC TAGGATCCAC ATTGTACACA CATTTGCTTA 

AGTCTATGGA GGCGCAAGGT TTTAAGTCTG TGGTTGCTGT 

TATAGGCCTT CCAAACGATC 
7201 CATCTGTTAG GTTGCATGAG GCTTTGGGAT ACACAGCCCG 

GGGTACATTG CGCGCAGCTG GATACAAGCA TGGTGGATGG 

CATGATGTTG GTTTTTGGCA 

Sad 



73 01 AAGGGATTTT GAGTTGCCAG CTCCTCCAAG GCCAGTTAGG 
CCAGTTACCC AGATCTGAGG TACCCTGAGC TCGGTCGCAG 
CGTGTGCGTG TCCGTCGTAC 
Fsel 



7401 GTTCTGGCCG GCCGGGCCTT GGGCGCGCGA TCAGAAGCGT 

TGCGTTGGCG TGTGTGTGCT TCTGGTTTGC TTTAATTTTA 

CCAAGTTTGT TTCAAGGTGG 
7501 ATCGCGTGGT CAAGGCCCGT GTGCTTTAAA GACCCACCGG 

CACTGGCAGT GAGTGTTGCT GCTTGTGTAG GCTTTGGTAC 

GTATGGGCTT TATTTGCTTC 
7601 TGGATGTTGT GTACTACTTG GGTTTGTTGA ATTATTATGA 

GCAGTTGCGT ATTGTAATTC AGCTGGGCTA CCTGGACATT 

GTTATGTATT AATAAATGCT 

Pmel 
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7701 TTGCTTTCTT CTAAAGATCT TTAAGTGCTG AATTCATATT 

TCCTCCTGCA GGGTTTAAAC TTGCCGTGGC CTATTTTCAG 

AAGAATTCCC AATAGTAGTC 
7801 CAAAATTTTT GTAACGAAGG GAGCATAATA GTTACATGCA 

AAGGAAAACT GCCATTCTTT AGAGGGGATG CTTGTTTAAG 

AACAAAAAAT ATATCACTTT 
7901 CTTTTGTTCC AAGTCATTGC GTATTTTTTT AAAAATATTT 

GTTCCTTCGT ATATTTCGAG CTTCAATCAC TTTATGGTTC 

TTTGTATTCT GGCTTTGCTG 
8001 TAAATCGTAG CTAACCTTCT TCCTAGCAGA AATTATTAAT 

ACTTGGGATA TTTTTTTAGA ATCTkAGTAAA TTACATATTA 
■ CCACCACATC GAGCTGCTTT 
8101 TAAATTCATA TTACAGCCAT ATAGGCTTGA TTCATTTTGC 

AAAATTTCCA GGATATTGAC AACGTTAACT TAATAATATC 

TTGAAATATT AAAGCTATTA 
8201 TGATTAGGGG TGCAAATGGA CCGAGTTGGT TCGGTTTATA 

TCAAAATCAA ACCAAACCAA CTATATCGGT TTGGATTGGT 

TCGGTTTTGC CGGGTTTTCA 
8301 GCATTTTCTG GTTTTTTTTT TGTTAGATGA ATATTATTTT 

AATCTTACTT TGTCAAATTT TTGATAAGTA AATATATGTG 

TTAGTAAAAA TTAATTTTTT 
8401 TTACAAACAT ATGATCTATT AAAATATTCT TATAGGAGAA 

TTTTCTTAAT AACACATGAT ATTTATTTAT TTTAGTCGTT 

TGACTAATTT TTCGTTGATG 
8501 TACACTTTCA AAGTTAACCA AATTTAGTAA TTAAGTATAA 

AAATCAATAT GATACCTAAA TAATGATATG TTCTATTTAA 

TTTTAAATTA TCGAAATTTC 
8601 ACTTCAAATT CGAAAAAGAT ATATAAGAAT TTTGATAGAT 

TTTGACATAT GAATATGGAA GAACAAAGAG ATTGACGGAT 

TTTAGTAACA CTTGATAAGA 
8701 AAGTGATCGT ACAACCAATT ATTTAAAGTT AATAAAAATG 

GAGCACTTCA TATTTAACGA AATATTACAT GCCAGAAGAG 

TCGCAAATAT TTCTAGATAT 
8801 TTTTTAAAGA AAATTCTATA AAAAGTCTTA AAGGCATATA 

TATAAAAACT ATATATTTAT ATTTTGGTTT GGTTCGAATT 

TGTTTTACTC AATACCAAAC 
8901 TAAATTAGAC CAAATATAAT TGGGTTTTTA ATCGCGGCCC 

ACTAGTCACC GGTGTAGCTT GGCGTAATCA TGGTCATAGC 

TGTTTCCTGT GTGAAATTGT 
9001 TATCCGCTCA CAATTCCACA CAACATACGA GCCGGAAGCA 

TAAAGTGTAA AGCCTGGGGT GCCTAATGAG TGAGCTAACT 

CACATTAATT GCGTTGCGCT 
9101 CACTGCCCGC TTTCCAGTCG GGAAACCTGT CGTGCCAGCT 

GCATTAATGA ATCGGCCAAC GCGCGGGGAG AGGCGGTTTG 

CGTATTGGGC GCTCTTCCGC 
9201 TGCGCACGCT GCGCACGCTG CGCACGCTTC CTCGCTCACT 

GACTCGCTGC GCTCGGTCGT TCGGCTGCX3G CGAGCGGTAT 

CAGCTCACTC AAAGGCGGTA , 
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9301 ATACGGTTAT CCACAGAATC AGGGGATAAC GCAGGAAAGA 
ACATGTGAGC AAAAGGCCAG CAAAAGGCCA GGAACCGTAA 
AAAGGCCGCG TTGCTGGCGT 
9401 TTTTCCATAG GCTCCGCCCC CCTGACGAGC ATCACAAAAA 
TCGACGCTCA AGTCAGAGGT GGCGAAACCC GACAGGACTA 
TAAAGATACC AGGCGTTTCC 
9501 CCCTGGAAGC TCCCTCGTGC GCTCTCCTGT TCCGACCCTG 
CCGCTTACCG GATACCTGTC CGCCTTTCTC CCTTCGGGAA 
GCGTGGCGCT TTCTCATAGC 
9601 TCACGCTGTA GGTATCTCAG TTCGGTGTAG GTCGTTCGCT 
CCAAGCTGGG CTGTGTGCAC GAACCCCCCG TTCAGCCCGA 
CCGCTGCGCC TTATCCGGTA 
9701 ACTATCGTCT TGAGTCCAAC CCGGTAAGAC ACGACTTATC 
GCCACTGGCA GCAGCCACTG GTAACAGGAT TAGCAGAGCG 
AGGTATGTAG GCGGTGCTAC 
9801 AGAGTTCTTG AAGTGGTGGC CTAACTACGG CTACACTAGA 
AGGACAGTAT TTGGTATCTG CGCTCTGCTG AAGCCAGTTA 
CCTTCGGAAA AAGAGTTGGT 
9901 AGCTCTTGAT CCGGCAAACA AACCACCGCT GGTAGCGGTG 
GTTTTTTTGT TTGCAAGCAG CAGATTACGC GCAGAAAAAA 
AGGATCTCAA GAAGATCCTT 
10001 TGATCTTTTC TACGGGGTCT GACGCTCAGT GGAACGAAAA . 
CTCACGTTAA GGGATTTTGG TCATGAGATT ATCAAAAAGG 
ATCTTCACCT AGATCCTTTT 
10101 AAATTAAAAA TGAAGTTTTA AATCAATCTA AAGTATATAT 
GAGTAAACTT GGTCTGACAG TTACCAATGC TTAATCAGTG 
AGGCACCTAT CTCAGCGATC 
10201 TGTCTATTTC GTTCATCCAT AGTTGCCTGA CTCCCCGTCG 
TGTAGATAAC TACGATACGG GAGGGCTTAC CATCTGGCCC 
CAGTGCTGCA ATGATACCGC 
10301 GAGACCCACG CTCACCGGCT CCAGATTTAT CAGCAATAAA 
CCAGCCAGCC GGAAGGGCCG AGCGCAGAAG TGGTCCTGCA 
ACTTTATCCG CCTCCATCCA 
10401 GTCTATTAAT TGTTGCCGGG AAGCTAGAGT AAGTAGTTCG 
CCAGTTAATA GTTTGCGCAA CGTTGTTGCC ATTGCTACAG 
GCATCGTGGT GTCACGCTCG 
10501 TCGTTTGGTA TGGCTTCATT CAGCTCCGGT TCCCAACGAT 
CAAGGCGAGT TACATGATCC CCCATGTTGT GCAAAAAAGC 
GGTTAGCTCC TTCGGTCCTC 
10601 CGATCGTTGT CAGAAGTAAG TTGGCCGCAG TGTTATCACt 
CATGGTTATG GCAGCACTGC ATAATTCTCT TACTGTCATG 
CCATCCGTAA GATGCTTTTC 
10701 TGTGACTGGT GAGTACTCAA CCAAGTCATT CTGAGAATAG 
TGTATGCGGC GACCGAGTTG CTCTTGCCCG GCX3TCAATAC 
GGGATAATAC CGCGCCACAT 
10801 AGCAGAACTT TAAAAGTGCT CATCATTGGA, AAACGTTCTT 
CGGGGCGAAA ACTCTCAAGG ATCTTACCGC TGTTGAGATC 
CAGTTCGATG TAACCCACTC 



FIG8B CONT. 
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10901 GTGCACCCAA 
CGTTTCTGGG 
AAAAAGGGAA 

11001 ACGGAAATGT 
TATTGAAGCA 
ACATATTTGA 

11101 AAAAATAAAC 
AAGTGCCACC 
ATTAACCTAT 
,11201 GTATCACGAG 
CGGTGAAAAC 
ACAGCTTGTC 

11301 TGCCGGGAGC 
GTTGGCGGGT 
AGCAGATTGT 

11401 CACCATATGC 
GAAAATACCG 
CAACTGTTGG 

11501 CGGTGCGGGC 
GGGATGTGCT 
TTTTCCCAGT 

11601 TAAAACGACG 
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CTGATCTTCA 
T6AGCAAAAA 
TAAGGGCGAC 
TGAATACTCA 
TTTATCAGGG 
ATGTATTTAG 
AAATAGGGGT 
TGACGTCTAA 
AAAAATAGGC 
GCCCTTTCGT 
CTCTGACACA 
TGTAAGCGGA 
AGACAAGCCC 
GTCGGGGCTG 
ACTGAGAGTG 
GGTGTGAAAT 
CATCAGGCGC 
GAAGGGCGAT 
CTCTTCGCTA 
GCAAGGCGAT 
CACGACGTTG 
GCCAGTGAAT 



GCATCTTTTA CTTTCACCAG 
CAGGAAGGCA AAATGCCGCA 

TACTCTTCCT TTTTCAATAT 
TTATTGTCTC ATGAGCGGAT 

TCCGCGCACA TTTCCCCGAA 
GAAACCATTA TTATCATGAC 

CTCGCGCGTT TCGGTGATGA 
TGCAGCTCCC GGAGACGGTC 

GTCAGGGCGC GTCAGCGGGT 
GCTTAACTAT GCGGCATCAG 

ACCGCACAGA TGCGTAAGGA 
CATTCGCCAT TCAGGCTGCG 

TTACGCCAGC TGGCGAAAGG 
TAAGTTGGGT AACGCCAGGG 

TACACCGGTG TGATCATGGG CCG 



FIG 8B CONT. 
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atgaagatgagacgctacaagctc^tca^ctgtatggccggcctgtgcctcatc^^ 

gtcctatgtcaccttcccccgagaactggcctccctcagccctaacctggtgtccagcttttt 

cgccccaggccagccccgagccaggaggccctgacctgctgcgtaccccactctactcccactogcccc^^ 

gctgccgcccagcaaggcggccgaggagct(xaccgggtggacttggtgctgcccgaggacacca^ 

gcgcaccaaggccggcggcgtctgcttcaaacccggcaccaagatgctggagaggccgcccccgggacggccgga 

gagaagcctgagggggccaacggctcctcggcccggcggccaccccggtacctcctgagcgcccgggagc^^ 

ggggccgaggcgcccggcgcaagtgggtggagtgcgtgtgcctgcccggctggcacggacccagctgcggcgtg^ 

cactgtggtgcagtactccaacctgcccaccaaggagcggctggtgcccagggaggtgccgcgccgcgtc^tc 

catcaacgtcaaccacgagttcgacxtgctggacgtgcgcttccacgagctgggcgacgtggtggacg(^ 

cgagtccaacttcacggcttatggggagccgcggccgctcaagttccgggagatgctgaccaatggca<xtt^ 

tccgccacaaggtgctcta^cttcctggaccacttcccgcccggcggccggc^g 

ctgcgcaccttcctcacxjcaggacggcgtctcgcggctgcgcaacctgcggcccgacgacgtcttcatcat^ 

ggacgagatcccggcccgtgacggcgtccttttcctcaagctctacgatggctggaccgagcccUcgccttcca 

caagtcgctctacggcttcttctggaagcagccgggcaccctggaggtggtgtcaggrt^ 

cagtgtatgggctggacggcatccgcctgcgccgccgccagtactacaccatgCKJcaacttcagacagtatgaga 

accggccacatcctggtgcagtggtcgctgggcagccccctgcacttcgccggctggcactgctcc^^ 

cgagggcatctacttcaagctcgtgtccgc(xagaatggcgacttcccacgctggggtgactacgaggacaagcgggac 

ctgaactacatccgcggcctgatccgcaccgggggctggttcgacggcacgcagcaggagtacccgcctgcagacccx 

agcgagcacatgtatgcgccx:aagtacctgctgaagaactacgaccggttccactacctgctggacaacccctaccagg 

gcccaggagcacggcggcgggcgggtggcgccacaggggtcccgagggaaggccgcccgcccggggcaaactgg 

acgaggcggaagtctag 



FIG. 10 
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SEflUENCE LISTING 

<110> PLANT RESEARCH INTERNATIONAL BV 
BAKKER-. Hendrikus A-C 
FLORACKi l>ionisius E-A. 
BOSCHi Hendrik J- 

<12D> fiNTIII expression in plants 

<130> bHflt2A - PD33313U0 

<150> US-bO/3b5-.7li') 
<1SX> 2002-D3-n 

<3.5a> US-bD/3bA-.DM7 
<1SI> 2D02-D3-2t 

<lbO> 27 

<170> Patentin version 3-2 



<21D> 1 

<2X1> lbM2 

<ei2> DNA 

<213> Homo sapiens 

<MDO> 1 

ccatggtgat gagacgctac aagctctttc 
tctccttcct gcacttcttc aagaccctgt 
ccctcagccc taacctggtg tccagctttt 
ccagccccga gccaggaggc cctgacctgc 
tgctgcagcc gctgccgccc agcaaggcgg 
tgcccgagga caccaccgag tatttcgtgc 
ccggcaccaa gatgctggag aggccgcccc 
ccaacggctc ctcggcccgg cggccacccc 
ggggccgagg cgcccggcgc aagtgggtgg 
ccagctgcgg cgtgcccact gtggtgcagt 
tgcccaggga ggtgccgcgc cgcgtcatca 
tgctggacgt gcgcttccac gagctgggcg 
ccaacttcac ggcttatggg gagccgcggc 
gcaccttcga gtacatccgc cacaaggtgc 
gcggccggca ggacggctgg atcgccgacg 
gcgtctcgcg gctgcgcaac ctgcggcccg 
agatcccggc ccgtgacggc gtccttttcc 
tcgccttcca catgcgcaag tcgctctacg 
aggtggtgtc aggctgcacg gtggacatgc 
gcctgcgccg ccgccagtac tacaccatgc 
gccacatcct ggtgcagtgg tcgctgggca 
cctggtgctt cacgcccgag ggcatctact 
tcccacgctg gggtgactac gaggacaagc 
gcaccggggg ctggttcgac ggcacgcagc 
acatgtatgc gcccaagtac ctgctgaaga 
acccctacca ggagcccagg agcacggcgg 
gaaggccgcc cgcccggggc aaactggacg 
aagaggatct gaattaggat cc 



tcatgttctg tatggccggc ctgtgcctca bD 

cctatgtcac cttcccccga gaactggcct 120 

tctggaacaa tgccccggtc acgccccagg IflO 

tgcgtacccc actctactcc cactcgcccc 2M0 

ccgaggagct ccaccgggtg gacttggtgc 300 

gcaccaaggc cggcggcgtc tgcttcaaac 3ti0 

cgggacggcc ggaggagaag cctgaggggg M20 

ggtacctcct gagcgcccgg gagcgcacgg MflO 

agtgcgtgtg cctgcccggc tggcacggac SMO 

actccaacct gcccaccaag gagcggctgg bOO 

acgccatcaa cgtcaaccac gagttcgacc tbO 

acgtggtgga cgcctttgtg gtgtgcgagt 720 

cgctcaagtt ccgggagatg ctgaccaatg 7fl0 

tctatgtctt cctggaccac ttcccgcccg flMO 

actacctgcg caccttcctc acccaggacg TOO 

acgacgtctt catcattgac gatgcggacg ^bO 

tcaagctcta cgatggctgg accgagccct 1020 

gcttcttctg gaagcagccg ggcaccctgg 1060 

tgcaggcagt gtatgggctg gacggcatcc IIMO 

ccaacttcag acagtatgag aaccgcaccg 1200 

gccccctgca cttcgccggc tggcactgct 12b0 

tcaagctcgt gtccgcccag aatggcgact 1320 

gggacctgaa ctacatccgc ggcctgatcc 13fl0 

aggagtaccc gcctgcagac cccagcgagc IMMD 

actacgaccg gttccactac ctgctggaca ISOD 

^999^999^9 gcgccacagg ggtcccgagg 15b0 

aggcggaagt cgaacaaaaa ctcatctcag lb2Q 

lbM2 



<210> 2 

<211> SMH 

<212> PRT 

<213> Homo sapiens 

<MDO> 2 

net Val net Arg Arg Tyr Lys Leu Phe Leu net Phe Cys Met Ala Gly 
IS ID 15 

Leu Cys Leu He Ser Phe Leu His Phe Phe Lys Thr Leu Ser Tyr Val 
20 25 30 
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Thr Phe Pro Arg Glu Leu Ala Ser Leu Ser Pro Asn Leu Val Ser Ser 
35 MD 45 

Phe Phe Trp Asn Asn Ala Pro Val Thr Pro Gin Ala Ser Pro Slu Pro 
5D 55 bD 

61y 61y Pro Asp Leu Leu Arg Thr Pro Leu Tyr Ser His Ser Pro Leu 
b5 70 75 flO 

Leu Cln Pro Leu Pro Pro Ser Lys Ala Ala filu Clu Leu His Arg Val 
65 "50 IS 

Asp Leu Val Leu Pro Glu Asp Thr Thr Glu Tyr Phe Val Arg Thr Lys 
100 105 110 

Ala Gly Gly Val Cys Phe Lys Pro Gly Thr Lys Het Leu Glu Arg Pro 
115 120 125 

Pro Pro Gly Arg Pro Glu Glu Lys Pro Glu Gly Ala Asn Gly Ser Ser 
13D 135 IMD 

Ala Arg Arg Pro Pro Arg Tyr Leu Leu Ser Ala Arg Glu Arg Thr Gly 
1M5 150 155 IbO 

Gly Arg Gly Ala Arg Arg Lys Trp Val Glu Cys Val Cys Leu Pro Gly 
lt5 170 175 

Trp His Gly Pro Ser Cys Gly Val Pro Thr Val Val Gin Tyr Ser Asn 
lAO 1&5 110 

Leu Pro Thr Lys Glu Arg Leu Val Pro Arg Glu Val Pro Arg Arg Val 
nS BDD 205 

He Asn Ala He Asn Val Asn His Glu Phe Asp Leu Leu Asp Val Arg 
21D 215 220 

Phe His Glu Leu Gly Asp Val Val Asp Ala Phe Val Val Cys Glu Ser 
EES 230 235 240 

Asn Phe Thr Ala Tyr Gly Glu Pro Arg Pro Leu Lys Phe Arg Glu flet 
BM5 250 255 

Leu Thr Asn Gly Thr Phe Glu Tyr He Arg His Lys Val Leu Tyr Val 
2bD 2t5 270 

Phe Leu Asp His Phe Pro Pro Gly Gly Arg Gin Asp Gly Trp He Ala 
S75 2&0 265 

Asp Asp Tyr Leu Arg Thr Phe Leu Thr Gin Asp Gly Val Ser Arg Leu 
BIO 215 300 

Arg Asn Leu Arg Pro Asp Asp Val Phe He He Asp Asp Ala Asp Glu 
3DS 310 315 320 

He Pro Ala Arg Asp Gly Val Leu Phe Leu Lys Leu Tyr Asp Gly Trp 
325 330 335 

Thr Glu Pro Phe Ala Phe His flet Arg Lys Ser Leu Tyr Gly Phe Phe 
340 345 350 

Trp Lys Gin Pro Gly Thr Leu Glu Val Val Ser Gly Cys Thr Val Asp 
355 3bD 3b5 

net Leu Gin Ala Val Tyr Gly Leu Asp Gly He Arg Leu Arg Arg Arg 
370 375 3flD 

Gin Tyr Tyr Thr net Pro Asn Phe Arg Gin Tyr Glu Asn Arg Thr Gly 
3fl5 310 315 400 

His He Leu Val Gin Trp Ser Leu Gly Ser Pro Leu His Phe Ala Gly 
H05 410 415 
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Trp His Cys Ser Trp Cys Phe Thr Pro 61u 61y lie Tyr Phe Lys Leu 
M2D 4HS 430 

Val Ser Ala Gin Asn Gly Asp Phe Pro Arg Trp Gly Asp Tyr Glu Asp 
435 4M5 

Lys Arg Asp Leu Asn Tyr lie Arg Gly Leu He Arg Thr 61-y Gly Trp 
MSD MSS MtO 

Phe Asp Gly Thr Gin Gin Glu Tyr Pro Pro Ala Asp Pro Ser Glu His 
MtS H7D 475 MflD 

net Tvr Ala Pro Lys Tyr Leu Leu Lys Asn Tyr Asp Arg Phe His Tyr 
Has «*1D MIS 

Leu Leu Asp Asn Pro Tyr Gin Glu Pro Arg Ser Thr Ala Ala Gly Gly 
500 505 510 

Trp Arg His Arg Gly Pro Glu Gly Arg Pro Pro Ala Arg Gly Lys Leu 
SIS SSD 5ES 

Asp Glu Ala Glu Val Glu Gin Lys Leu He Ser Glu Glu Asp Leu Asn 
530 535 SMD 



<B1D> 3 

<B11> 31 

<S1B> DNA 

<213> Artificial Sequence 
<22D> 

<223> Synthetic 

<M0D> 3 

atactcgagt taacaatgaa gatgagacgc t 



<21D> M 

<211> 35 

<21E> DNA 

<213> Artificial Sequence 
<2BD> 

<S23> Synthetic 

<M00> ^ 

tatggatcct aattcagatc ctcttctgag atgag 



<21D> 5 

<211> 20 

<212> DNA 

<213> Artificial Sequence 
<220> 

<S23> Synthetic 

<MDD> 5 

ccatggtgat gagacgctac 



<210> b 

<211> 32 

<212> DNA 

<213> Artificial Sequence 
<H2D> 

<2S3> Synthetic 



<MOQ> b 
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gtttaaacct aggatcctaa ttcagatcct ct 



as 



<210> 7 

<H12> DNA 

<E13> Homo sapiens 



<MDD> 7 

atgaagatga 

tccttcctgc 

ctcagcccta 

agccccgagc 

ctgcagccgc 

cccgaggaca 

ggcaccaaga 

aacggctcct 

ggccgaggcg 

agctgcggcg 

cccagggagg 

ctggacgtgc 

aacttcacgg 

accttcgagt 

ggccggcagg 

gtctcgcggc 

atcccggccc 

gccttccaca 

gtggtgtcag 

ctgcgccgcc 

cacatcctgg 

tggtgcttca 

ccacgctggg 

accgggggct 

atgtatgcgc 

ccctaccagg 



gacgctacaa 
acttcttcaa 
acctggtgtc 
caggaggccc 
tgccgcccag 
ccaccgagta 
tgctggagag 
cggcccggcg 
cccggcgcaa 
tgcccactgt 
tgccgcgccg 
gcttccacga 
cttatgggga 
acatccgcca 
acggctggat 
tgcgcaacct 
gtgacggcgt 
tgcgcaagtc 
gctgcacggt 
gccagtacta 
tgcagtggtc 
cgcccgaggg 
gtgactacga 
ggttcgacgg 
ccaagtacct 
agcccaggag 
cccggggcaa 



gctctttctc 
gaccctgtcc 
cagctttttc 
tgacctgctg 
caaggcggcc 
tttcgtgcgc 
gccgcccccg 
gccaccccgg 
gtgggtggag 
ggtgcagtac 
cgtcatcaac 
gctgggcgac 
gccgcggccg 
caaggtgctc 
cgccgacgac 
gcggcccgac 
ccttttcctc 
gctctacggc 
ggacatgctg 
caccatgccc 
gctgggcagc 
catctacttc 
ggacaagcgg 
cacgcagcag 
gctgaagaac 
cacggcggcg 
actggacgag 



atgttctgta 
tatgtcacct 
tggaacaatg 
cgtaccccac 
gaggagctcc 
accaaggccg 
ggacggccgg 
tacctcctga 
tgcgtgtgcc 
tccaacctgc 
gccatcaacg 
gtggtggacg 
ctcaagttcc 
tatgtcttcc 
tacctgcgca 
gacgtcttca 
aagctctacg 
ttcttctgga 
caggcagtgt 
aacttcagac 
cccctgcact 
aagctcgtgt 
gacctgaact 
gagtacccgc 
tacgaccggt 
ggcgggtggc 
gcggaagtct 



tggccggcct 
tcccccgaga 
ccccggtcac 
tctactccca 
accgggtgga 
gcggcgtctg 
aggagaagcc 
gcgcccggga 
tgcccggctg 
ccaccaagga 
tcaaccacga 
cctttgtggt 
gggagatgct 
tggaccactt 
ccttcctcac 
tcattgacga 
atggctggac 
agcagccggg 
atgggctgga 
agtatgagaa 
tcgccggctg 
ccgcccagaa 
acatccgcgg 
ctgcagaccc 
tccactacct 
gccacagggg 
ag 



gtgcctcatc 
actggcctcc 
gccccaggcc 
ctcgcccctg 
cttggtgctg 
cttcaaaccc 
tgagggggcc 
gcgcacgggg 
gcacggaccc 
gcggctggtg 
gttcgacctg 
gtgcgagtcc 
gaccaatggc 
cccgcccggc 
ccaggacggc 
tgcggacgag 
cgagcccttc 
caccctggag 
cggcatccgc 
ccgcaccggc 
gcactgctcc 
tggcgacttc 
cctgatccgc 
cagcgagcac 
gctggacaac 
tcccgaggga 



120 
16Q 
EMQ 
300 
atiO 
MHO 
MAO 
5M0 
bOD 
bdO 
7BD 
7fiQ 
&40 
100 
"IbO 
lOBO 

loao 

IIMO 
3.S00 
IBbO 
13B0 
13A0 
IMMO 
1500 
15b0 
IbOS 



<B10> ft 

<ail> 7D27 

<21H> DNA 

<B13> Artificial Sequence 



<2B0> 

<2B3> Synthetic 



<M00> ft 

catgattacg 

caggggaagc 

gcgtgacccg 

aattaccaca 

tatatttaaa 

ttagagaatc 

aacaggactc 

agcttcacct 

aatggttttt 

ttaagaaaac 

taaaataaag 

acatttttct 

ggacaccaac 

tctctgtcgc 

ctgtcggcat 

ctcctcctcc 

ttcccttcct 

cgtgttgttc 

ctccgcttca 

ggcgttccgg 

agatccgtgt 

tcagacacgt 

ctagccgttc 

ggtttgccct 



ccaagctagc 
ttgcatgcct 
gtcgtgcccc 
tatttttttt 
ctttaatcta 
atataaatga 
tacagtttta 
atataatact 
atagactaat 
taaaactcta 
tgactaaaaa 
tgtttcgagt 
cagcgaacca 
tgcctctgga 
ccagaaattg 
tctcacggca 
cgcccgccgt 
ggagcgcaca 
aggtacgccg 
tccatgcatg 
ttgtgttaga 
tctgattgct 
cgcagacggg 
tttcctttat 



ggccgcattc 
gcagatcccc 
tctctagaga 
gtcacacttg 
cgaataatat 
acagttagac 
tctttttagt 
tcatccattt 
ttttttagta 
ttttagtttt 
ttaaacaaat 
agataatgcc 
gcagcgtcgc 
cccctctcga 
cgtggcggag 
cggcagctac 
aataaataga 
cacacacaac 
ctcgtcctcc 
gttagggccc 
tccgtgctgc 
aacttgccag 
atcgatttca 
ttcaatatat 



ccgggaagct 
ggggatcctc 
taatgagcat 
tttgaagtgc 
aatctatagt 
atggtctaaa 
gtgcatgtgt 
tattagtaca 
catctatttt 
tttatttaat 
accctttaag 
agcctgttaa 
9tcgggccaa 
gagttccgct 
cggcagacgt 
gggggattcc 
caccccctcc 
cagatctccc 
cccccccccc 
ggtagttcta 
tagcgttcgt 
tgtttctctt 
tgattttttt 
gccgtgcact 



aggccaccgt 
tagagtcgac 
tgcatgtcta 
agtttatcta 
actacaataa 
ggacaattga 
tctccttttt 
tccatttagg 
attctatttt 
aatttagata 
aaattaaaaa 
acgccgtcga 
gcgaagcaga 
ccaccgttgg 
gagccggcac 
tttcccaccg 
acaccctctt 
ccaaatccac 
ctctctacct 
cttctgttca 
acacggatgc 
tggggaatcc 
tgtttcgttg 
tgtttgtcgg 



ggcccgcctg 
ctgcagtgca 
agttataaaa 
tctttataca 
tatcagtgtt 
gtattttgac 
ttttgcaaat 
gtttagggtt 
agcctctaaa 
taaaatagaa 
aactaaggaa 
cgagtctaac 
cggcacggca 
acttgctccg 
ggcaggcggc 
ctccttcgct 
tccccaacct 
ccgtcggcac 
tctctagatc 
tgtttgtgtt 
gacctgtacg 
tgggatggct 
catagggttt 
gtcatctttt 



bO 
120 

lao 

BMO 
300 
3bQ 
MBQ 
MAO 
5M0 
bOO 
bbO 
720 
7A0 
AMO 
"lOO 

1020 
lOAO 

imo 

12DD 
IBbO 
1320 
13BQ 
mMD 
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catgcttttt tttgtcttgg ttgtgatgat gtggtctggt tgggcggtcg ttctagatcg ISOO 

gagtagaatt ctgtttcaaa ctacctggtg gatttattaa ttttggatct gtatgtgtgt ISbD 

gccatacata ttcatagtta cgaattgaag atgatggatg gaaatatcga tctaggatag IbSO 

• gtatacatgt tgatgcgggt tttactgatg catatacaga gatgcttttt gttcgcttgg IbflD 

ttgtgatgat gtggtgtggt tgggcggtcg ttcattcgtt ctagatcgga gtagaatact 17MD 

gtttcaaact acctggtgta tttattaatt ttggaactgt atgtgtgtgt catacatctt IflDD 

catagttacg agtttaagat ggatggaaat atcgatctag gataggtata catgttgatg X&bO 

tgggttttac tgatgcatat acatgatggc atatgeagca tctattcata- tgctctaacc nSD 

ttgagtacct atctattata ataaacaagt atgttttata attattttga tcttgatata nftO 

cttggatgat ggcatatgca gcagctatat gtggattttt ttagccctgc cttcatacgc BOMD 

tatttatttg cttggtactg tttcttttgt cgatgctcac cctgttgttt ggtgttactt 21D0 

ctgcagggta cccccggggt cgaccatggt aaggggcagc caccaccacc accaccacat HltO 

ggtccgtcct gtagaaaccc caacccgtga aatcaaaaaa ctcgacggcc tgtgggcatt BS20 

cagtctggat cgcgaaaact gtggaattga tcagcgttgg tgggaaagcg cgttacaaga 2B6D 

aagccgggca attgctgtgc caggcagttt taacgatcag ttcgccgatg cagatattcg B3MD 

taattatgcg ggcaacgtct ggtatcagcg cgaagtcttt ataccgaaag gttgggcagg BMDD 

ccagcgtatc gtgctgcgtt tcgatgcggt cactcattac ggcaaagtgt gggtcaataa BMbD 

tcaggaagtg atggagcatc agggcggcta tacgccattt gaagccgatg tcacgccgta BSBO 

tgttattgcc gggaaaagtg tacgtatcac cgtttgtgtg aacaacgaac tgaactggca BSaO 

gactatcccg ccgggaatgg tgattaccga cgaaaacggc aagaaaaagc agtcttactt BbMO 

ccatgatttc tttaactatg ccggaatcca tcgcagcgta atgctctaca ccacgccgaa B7DD 

cacctgggtg gacgatatca ccgtggtgac gcatgtcgcg caagactgta accacgcgtc B7t0 

tgttgactgg caggtggtgg ccaatggtga tgtcagcgtt gaactgcgtg atgcggatca BABO 

acaggtggtt gcaactggac aaggcactag cgggactttg caagtggtga atccgcacct BAflO 

ctggcaaccg ggtgaaggtt atctctatga actgtgcgtc acagccaaaa gccagacaga B1MD 

gtgtgatatc tacccgcttc gcgtcggcat ccggtcagtg gcagtgaagg gcgaacagtt 30DD 

cctgattaac cacaaaccgt tctactttac tggctttggt cgtcatgaag atgcggactt 3DbD 

acgtggcaaa ggattcgata acgtgctgat ggtgcacgac cacgcattaa tggactggat 31B0 

tggggccaac tcctaccgta cctcgcatta cccttacgct gaagagatgc tcgactgggc 3160 

agatgaacat ggcatcgtgg tgattgatga aactgctgct gtcggcttta acctctcttt 3aMD 

aggcattggt ttcgaagcgg gcaacaagcc gaaagaactg tacagcgaag aggcagtcaa 3300 

cggggaaact cagcaagcgc acttacaggc gattaaagag ctgatagcgc gtgacaaaaa 33b0 

ccacccaagc gtggtgatgt ggagtattgc caacgaaccg gatacccgtc cgcaagtgca 3MBD 

cgggaatatt tcgccactgg cggaagcaac gcgtaaactc gacccgacgc gtccgatcac 3MflD 

ctgcgtcaat gtaatgttct gcgacgctca caccgatacc atcagcgatc tctttgatgt 3SM0 

gctgtgcctg aaccgttatt acggatggta tgtccaaagc ggcgatttgg aaacggcaga 3b00 

gaaggtactg gaaaaagaac ttctggcctg gcaggagaaa ctgcatcagc cgattatcat 3bbD 

caccgaatac ggcgtggata cgttagccgg gctgcactca atgtacaccg acatgtggag 37B0 

tgaagagtat cagtgtgcat ggctggatat gtatcaccgc gtctttgatc gcgtcagcgc 37flD 

cgtcgtcggt gaacaggtat ggaatttcgc cgattttgcg acctcgcaag gcatattgcg 3flM0 

cgttggcggt aacaagaaag ggatcttcac tcgcgaccgc aaaccgaagt cggcggcttt 3100 

tctgctgcaa aaacgctgga ctggcatgaa cttcggtgaa aaaccgcagc agggaggcaa 3'lbO 

acaatgataa tgagctcgtt taaactgagg gcactgaagt cgcttgatgt gctgaattgt MOBO 

ttgtgatgtt ggtggcgtat tttgtttaaa taagtaagca tggctgtgat tttatcatat MDflO 

gatcgatctt tggggtttta tttaacacat tgtaaaatgt gtatctatta ataactcaat MIMO 

gtataagatg tgttcattct tcggttgcca tagatctgct tatttgacct gtgatgtttt MBOD 

gactccaaaa accaaaatca caactcaata aactcatgga atatgtccac ctgtttcttg MBbO 

aagagttcat ctaccattcc agttggcatt tatcagtgtt gcagcggcgc tgtgctttgt M3B0 

aacataacaa ttgttacggc atatatccaa cggccggcct agctagccac ggtggccaga M3flO 

tccactagtt ctagagcggc cgcttaattc actggccgtc gttttacaac gtcgtgactg MMMO 

ggaaaaccct ggcgttaccc aacttaatcg ccttgcagca catccccctt tcgccagctg MSOO 

gcgtaatagc gaagaggccc gcaccgatcg cccttcccaa cagttgcgca gcctgaatgg MSbO 

cgaatggcgc ctgatgcggt attttctcct tacgcatctg tgcggtattt cacaccgcat MbBO 

atggtgcact ctcagtacaa tctgctctga tgccgcatag ttaagccagc cccgacaccc MbfiO 

gccaacaccc gctgacgcgc cctgacgggc ttgtctgctc ccggcatccg cttacagaca M7M0 

agctgtgacc gtctccggga gctgcatgtg tcagaggttt tcaccgtcat caccgaaacg MflOD 

cgcgagacga aagggcctcg tgatacgcct atttttatag gttaatgtca tgataataat MflbO 

ggtttcttag acgtcaggtg gcacttttcg gggaaatgtg cgcggaaccc ctatttgttt M1B0 

atttttctaa atacattcaa atatgtatcc gctcatgaga caataaccct gataaatgct MIAD 

tcaataatat tgaaaaagga agagtatgag tattcaacat ttccgtgtcg cccttattcc SDMD 

cttttttgcg gcattttgcc ttcctgtttt tgctcaccca gaaacgctgg tgaaagtaaa SIDO 

agatgctgaa gatcagttgg gtgcacgagt gggttacatc gaactggatc tcaacagcgg SlbO 

taagatcctt gagagttttc gccccgaaga acgttttcca atgatgagca cttttaaagt SBBO 

tctgctatgt ggcgcggtat tatcccgtat tgacgccggg caagagcaac tcggtcgccg SBflO 

catacactat tctcagaatg acttggttga gtactcacca gtcacagaaa agcatcttac S3H0 

ggatggcatg acagtaagag aattatgcag tgctgccata accatgagtg ataacactgc 5MD0 

ggccaactta cttctgacaa cgatcggagg accgaaggag ctaaccgctt ttttgcacaa SMbO 

catgggggat catgtaactc gccttgatcg ttgggaaccg gagctgaatg aagccatacc 5SE0 

aaacgacgag cgtgacacca cgatgcctgt agcaatggca acaacgttgc gcaaactatt SSflO 

aactggcgaa ctacttactc tagcttcccg gcaacaatta atagactgga tggaggcgga SbMD 

taaagttgca ggaccacttc tgcgctcggc ccttccggct ggctggttta ttgctgataa S700 
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atctggagcc ggtgagcgtg ggtctcgcgg tatcattgca gcactggggc cagatggtaa S7tD 

gccctcccgt atcgtagtta tctacacgac ggggagtcag gcaactatgg atgaacgaaa S6BD 

tagacagatc gctgagatag gtgcctcact gattaagcat tggtaactgt cagaccaagt 5&A0 

ttactcatat atactttaga ttgatttaaa acttcatttt taatttaaaa ggatctaggt Smo 

gaagatcctt tttgataatc tcatgaccaa aatcccttaa cgtgagtttt cgttccactg liQDO 

agcgtcagac cccgtagaaa agatcaaagg atcttcttga gatccttttt ttctgcgcgt bObO 

aatctgctgc ttgcaaacaa aaaaaccacc gctaccagcg gtggtttgtt tgccggatca blBD 

agagctacca actctttttc cgaaggtaac tggcttcagc agagcgcaga taccaaatac blAO 

tgtccttcta gtgtagccgt agttaggcca ccacttcaag aactctgtag caccgcctac bSMD 

atacctcgct ctgctaatcc tgttaccagt ggctgctgcc agtggcgata agtcgtgtct b3D0 

taccgggttg gactcaagac gatagttacc ggataaggcg cagcggtcgg gctgaacggg bBbQ 

gggttcgtgc acacagccca gcttggagcg aacgacctac accgaactga gatacctaca bMSO 

gcgtgagcat tgagaaagcg ccacgcttcc cgaagggaga aaggcggaca ggtatccggt bM&O 

aagcggcagg gtcggaacag gagagcgcac gagggagctt ccagggggaa acgcctggta bSMO 

tctttatagt cctgtcgggt ttcgccacct ctgacttgag cgtcgatttt tgtgatgctc bbOO 

gtcagggggg cggagcctat ggaaaaacgc cagcaacgcg gcctttttac ggttcctggc bbbD 

cttttgctgg ccttttgctc acatgttctt tcctgcgtta tcccctgatt ctgtggataa b72D 

ccgtattacc gcctttgagt gagctgatac cgctcgccgc agccgaacga ccgagcgcag b7fl0 

cgagtcagtg agcgaggaag cggaagagcg cccaatacgc aaaccgcctc tccccgcgcg bAMD 

ttggccgatt cattaatgca gctggcacga caggtttccc gactggaaag cgggcagtga bSDD 

gcgcaacgca attaatgtga gttagctcac tcattaggca ccccaggctt tacactttat bIfcD 

gcttccggct cgtatgttgt gtggaattgt gagcggataa caatttcaca caggaaacag 7DH0 

ctatgac 702"? 



<E1D> T 
<Ell> b&lA 
<2]iB> DNA 

<213> Artificial Sequence 
<22a> 

<BB3> Synthetic 
<^DD> 1 

cctgcagatc cccggggatc ctctagagtc gacctgcagt gcagcgtgac ccggtcgtgc bO 

ccctctctag agataatgag cattgcatgt ctaagttata aaaaattacc acatattttt IBQ 

tttgtcacac ttgtttgaag tgcagtttat ctatctttat acatatattt aaactttaat lAO 

ctacgaataa tataatctat agtactacaa taatatcagt gttttagaga atcatataaa BMD 

tgaacagtta gacatggtct aaaggacaat tgagtatttt gacaacagga ctctacagtt 300 

ttatcttttt agtgtgcatg tgttctcctt tttttttgca aatagcttca cctatataat 3b0 

acttcatcca ttttattagt acatccattt agggtttagg gttaatggtt tttatagact HBO 

aattttttta gtacatctat tttattctat tttagcctct aaattaagaa aactaaaact HAO 

ctattttagt ttttttattt aataatttag atataaaata gaataaaata aagtgactaa 5H0 

aaattaaaca aatacccttt aagaaattaa aaaaactaag gaaacatttt tcttgtttcg bOO 

agtagataat gccagcctgt taaacgccgt cgacgagtct aacggacacc aaccagcgaa bbO 

ccagcagcgt cgcgtcgggc caagcgaagc agacggcacg gcatctctgt cgctgcctct 7BD 

ggacccctct cgagagttcc gctccaccgt tggacttgct ccgctgtcgg catccagaaa 7A0 

ttgcgtggcg gagcggcaga cgtgagccgg cacggcaggc ggcctcctcc tcctctcacg AMD 

gcacggcagc tacgggggat tcctttccca ccgctccttc gctttccctt cctcgcccgc SOD 

cgtaataaat agacaccccc tccacaccct ctttccccaa cctcgtgttg ttcggagcgc "ibO 

acacacacac aaccagatct cccccaaatc cacccgtcgg cacctccgct tcaaggtacg lOBO 

ccgctcgtcc tccccccccc cccctctcta ccttctctag atcggcgttc cggtccatgc lOAO 

atggttaggg cccggtagtt ctacttctgt tcatgtttgt gttagatccg tgtttgtgtt IXMQ 

agatccgtgc tgctagcgtt cgtacacgga tgcgacctgt acgtcagaca cgttctgatt liBOQ 

gctaacttgc cagtgtttct ctttggggaa tcctgggatg gctctagccg ttccgcagac IBbO 

gggatcgatt tcatgatttt ttttgtttcg ttgcataggg tttggtttgc ccttttcctt 13B0 

tatttcaata tatgccgtgc acttgtttgt cgggtcatct tttcatgctt ttttttgtct 13AD 

tggttgtgat gatgtggtct ggttgggcgg tcgttctaga tcggagtaga attctgtttc IMHO 

aaactacctg gtggatttat taattttgga tctgtatgtg tgtgccatac atattcatag 1500 

ttacgaattg aagatgatgg atggaaatat cgatctagga taggtataca tgttgatgcg 15b0 

ggttttactg atgcatatac agagatgctt tttgttcgct tggttgtgat gatgtggtgt IbBO 

ggttgggcgg tcgttcattc gttctagatc ggagtagaat actgtttcaa actacctggt ]ibAO 

gtatttatta attttggaac tgtatgtgtg tgtcatacat cttcatagtt acgagtttaa 17H0 

gatggatgga aatatcgatc taggataggt atacatgttg atgtgggttt tactgatgca lAOD 

tatacatgat ggcatatgca gcatctattc atatgctcta accttgagta cctatctatt %&hQ 

ataataaaca agtatgtttt ataattattt tgatcttgat atacttggat gatggcatat I'^IBO 

gcagcagcta tatgtggatt tttttagccc tgccttcata cgctatttat ttgcttggta I'^BD 

ctgtttcttt tgtcgatgct caccctgttg tttggtgtta cttctgcagg gtacccccgg BDMO 

ggtcgaccat ggtgatgaga cgctacaagc tctttctcat gttctgtatg gccggcctgt BIDO 

gcctcatctc cttcctgcac ttcttcaaga ccctgtccta tgtcaccttc ccccgagaac BlbO 

tggcctccct cagccctaac ctggtgtcca gctttttctg gaacaatgcc ccggtcacgc BBBO 
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cccaggccag 
cgcccctgct 
tggtgctgcc 
tcaaacccgg 

gcacgggggg 
acggacccag 
ggctggtgcc 
tcgacctgct 
gcgagtccaa 
ccaatggcac 
cgcccggcgg 
aggacggcgt 
cggacgagat 
agcccttcgc 
ccctggaggt 
gcatccgcct 
gcaccggcca 
actgctcctg 
gcgacttccc 
tgatccgcac 
gcgagcacat 
tggacaaccc 
ccgagggaag 
tctcagaaga 
atgtgctgaa 
tgattttatc 
attaataact 
acctgtgatg 
ccacctgttt 
gcgctgtgct 
ccacggtggc 
caacgtcgtg 
cctttcgcca 
cgcagcctga 
atttcacacc 
cagccccgac 
tccgcttaca 
tcatcaccga 
gtcatgataa 
acccctattt 
ccctgataaa 
gtcgccctta 
ctggtgaaag 
gatctcaaca 
agcactttta 
caactcggtc 
gaaaagcatc 
agtgataaca 
gcttttttgc 
aatgaagcca 
ttgcgcaaac 
tggatggagg 
tttattgctg 
gggccagatg 
atggatgaac 
ctgtcagacc 
aaaaggatct 
ttttcgttcc 
ttttttctgc 
tgtttgccgg 
cagataccaa 
gtagcaccgc 
gataagtcgt 
tcgggctgaa 
ctgagatacc 
gacaggtatc 
ggaaacgcct 
tttttgtgat 
ttacggttcc 
gattctgtgg 



ccccgagcca 
gcagccgctg 
cgaggacacc 
caccaagatg 
cggctcctcg 
ccgaggcgcc 
ctgcggcgtg 
cagggaggtg 
ggacgtgcgc 
cttcacggct 
cttcgagtac 
ccggcaggac 
ctcgcggctg 
cccggcccgt 
cttccacatg 
ggtgtcaggc 
gcgccgccgc 
catcctggtg 
gtgcttcacg 
acgctggggt 
cgggggctgg 
gtatgcgccc 
ctaccaggag 
gccgcccgcc 
ggatctgaat 
ttgtttgtga 
atatgatcga 
caatgtataa 
ttttgactcc 
cttgaagagt 
ttgtaacata 
cagatccact 
actgggaaaa 
gctggcgtaa 
atggcgaatg 
gcatatggtg 
acccgccaac 
gacaagctgt 
aacgcgcgag 
taatggtttc 
gtttattttt 
tgcttcaata 
ttcccttttt 
taaaagatgc 
gcggtaagat 
aagttctgct 
gccgcataca 
ttacggatgg 
ctgcggccaa 
acaacatggg 
taccaaacga 
tattaactgg 
cggataaagt 
ataaatctgg 
gtaagccctc 
gaaatagaca 
aagtttactc 
aggtgaagat 
actgagcgtc 
gcgtaatctg 
atcaagagct 
atactgtcct 
ctacatacct 
gtcttaccgg 
cggggggttc 
tacagcgtga 
cggtaagcgg 
ggtatcttta 
gctcgtcagg 
tggccttttg 
ataaccgtat 



ggaggccctg 

ccgcccagca 
accgagtatt 
ctggagaggc 
gcccggcggc 
cggcgcaagt 
cccactgtgg 
ccgcgccgcg^ 
ttccacgagc 
tatggggagc 
atccgccaca 
ggctggatcg 
cgcaacctgc 
gacggcgtcc 
cgcaagtcgc 
tgcacggtgg 
cagtactaca 
cagtggtcgc 
cccgagggca 
gactacgagg 
ttcgacggca 
aagtacctgc 
cccaggagca 
cggggcaaac 
taggatccta 
tgttggtggc 
tctttggggt 
gatgtgttca 
aaaaaccaaa 
tcatctacca 
acaattgttc 
agttctagag 
ccctggcgtt 
tagcgaagag 
gcgcctgatg 
cactctcagt 
acccgctgac 
gaccgtctcc 
acgaaagggc 
ttagacgtca 
ctaaatacat 
atattgaaaa 
tgcggcattt 
tgaagatcag 
ccttgagagt 
atgtggcgcg 
ctattctcag 
catgacagta 
cttacttctg 
ggatcatgta 
cgagcgtgac 
cgaactactt 
tgcaggacca 
agccggtgag 
ccgtatcgta 
gatcgctgag 
atatatactt 
cctttttgat 
agaccccgta 
ctgcttgcaa 
accaactctt 
tctagtgtag 
cgctctgcta 
gttggactca 
gtgcacacag 
gcattgagaa 
cagggtcgga 
tagtcctgtc 
ggggcggagc 
ctggcctttt 
taccgccttt 



acctgctgcg 
aggcggccga 
tcgtgcgcac 
cccccccggg 
caccccggta 
gggtggagtg 
tgcagtactc 
tcatcaacgc 
tgggcgacgt 
cgcggccgct 
aggtgctcta 
ccgacgacta 
ggcccgacga 
ttttcctcaa 
tctacggctt 
acatgctgca 
ccatgcccaa 
tgggcagccc 
tctacttcaa 
acaagcggga 
cgcagcagga 
tgaagaacta 
cggcggcggg 
tggacgaggc 
ggtttaaact 
gtattttgtt 
tttatttaac 
ttcttcggtt 
atcacaactc 
ttccagttgg 
acggcatata 
cggccgctta 
acccaactta 
gcccgcaccg 
cggtattttc 
acaatctgct 
gcgccctgac 
gggagctgca 
ctcgtgatac 
ggtggcactt 
tcaaatatgt 
aggaagagta 
tgccttcctg 
ttgggtgcac 
tttcgccccg 
gtattatccc 
aatgacttgg 
agagaattat 
acaacgatcg 
actcgccttg 
accacgatgc 
actctagctt 
cttctgcgct 
cgtgggtctc 
gttatctaca 
ataggtgcct 
tagattgatt 
aatctcatga 
gaaaagatca 
acaaaaaaac 
tttccgaagg 
ccgtagttag 
atcctgttac 
agacgatagt 
cccagcttgg 
agcgccacgc 
acaggagagc 
gggtttcgcc 
ctatggaaaa 
gctcacatgt 
gagtgagctg 



taccccactc 
ggagctccac 
caaggccggc 
acggccggag 
cctcctgagc 
cgtgtgcctg 
caacctgccc 
catcaacgtc 
ggtggacgcc 
caagttccgg 
tgtcttcctg 
cctgcgcacc 
cgtcttcatc 
gctctacgat 
cttctggaag 
ggcagtgtat 
cttcagacag 
cctgcacttc 
gctcgtgtcc 
cctgaactac 
gtacccgcct 
cgaccggttc 
cgggtggcgc 
ggaagtcgaa 
gagggcactg 
taaataagta 
acattgtaaa 
gccatagatc 
aataaactca 
catttatcag 
tccacggccg 
attcactggc 
atcgccttgc 
atcgcccttc 
tccttacgca 
ctgatgccgc 
gggcttgtct 
tgtgtcagag 
gcctattttt 
ttcggggaaa 
atccgctcat 
tgagtattca 
tttttgctca 
gagtgggtta 
aagaacgttt 
gtattgacgc 
ttgagtactc 
gcagtgctgc 
gaggaccgaa 
atcgttggga 
ctgtagcaat 
cccggcaaca 
cggcccttcc 
gcggtatcat 
cgacggggag 
cactgattaa 
taaaacttca 
ccaaaatccc 
aaggatcttc 
caccgctacc 
taactggctt 
gccaccactt 
cagtggctgc 
taccggataa 
agcgaacgac 
ttcccgaagg 
gcacgaggga 
acctctgact 
acgccagcaa 
tctttcctgc 
ataccgctcg 



tactcccact 
cgggtggact 
ggcgtctgct 
gagaagcctg 
gcccgggagc 
cccggctggc 
accaaggagc 
aaccacgagt 
tttgtggtgt 
gagatgctga 
gaccacttcc 
ttcctcaccc 
attgacgatg 
ggctggaccg 
cagccgggca 
gggctggacg 
tatgagaacc 
gccggctggc 
gcccagaatg 
atccgcggcc 
gcagacccca 
cactacctgc 
cacaggggtc 
caaaaactca 
aagtcgcttg 
agcatggctg 
atgtgtatct 
tgcttatttg 
tggaatatgt 
tgttgcagcg 
gcctagctag 
cgtcgtttta 
agcacatccc 
ccaacagttg 
tctgtgcggt 
atagttaagc 
gctcccggca 
gttttcaccg 
ataggttaat 
tgtgcgcgga 
gagacaataa 
acatttccgt 
cccagaaacg 
catcgaactg 
tccaatgatg 
cgggcaagag 
accagtcaca 
cataaccatg 
ggagctaacc 
accggagctg 
ggcaacaacg 
attaatagac 
ggctggctgg 
tgcagcactg 
tcaggcaact 
gcattggtaa 
tttttaattt 
ttaacgtgag 
ttgagatcct 
agcggtggtt 
cagcagagcg 
caagaactct 
tgccagtggc 
ggcgcagcgg 
ctacaccgaa 
gagaaaggcg 
gcttccaggg 
tgagcgtcga 
cgcggccttt 
gttatcccct 
ccgcagccga 



se&o 

SMDD 

B520 
25A0 
2bM0 
270D 
27tQ 
SA20 

saao 

3000 
30b0 
3120 

3iao 

32M0 
3300 
33b0 
3M20 
3M60 
35M0 
3b00 
3bbD 
3720 
37A0 
SftMO 
3100 
31bO 
Ma20 
MOAO 

Mmo 

M2D0 
M2li0 
M320 
M3fi0 

^^^o 

M500 
M5b0 
Mb20 
HbAO 
M7M0 
MAOO 
MAbO 
H120 
M<)AO 
SOMD 
SlOO 
SlbO 
5220 
5260 
53M0 
5MD0 
5MbO 
5520 
5560 
5b40 
5700 
57bD 
5620 
5660 
51M0 
bODD 
bObO 
bX20 
b3i60 
b2MD 
b300 
b3bQ 
bMEO 
bM60 
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acgaccgagc gcagcgagtc agtgagcgag gaagcggaag agcgcccaat acgcaaaccg tSMD 

cctctccccg cgcgttggcc gattcattaa tgcagctggc acgacaggtt tcccgactgg bbDD 

aaagcgggca gtgagcgcaa cgcaattaat gtgagttagc tcactcatta ggcaccccag bbbD 

gctttacact ttatgcttcc ggctcgtatg ttgtgtggaa ttgtgagcgg ataacaattt t720 

cacacaggaa acagctatga ccatgattac gccaagctag cggccgcatt cccgggaagc b7fla 

taggccaccg tggcccgcct gcaggggaag cttgcatg tfllfl 



<51D> IQ 

<ail> 7SHS 

<aiE> DNA 

<H13> Artificial Sequence 
<HBD> 

<S23> Synthetic 



<MOa> ID 

cgattaaaaa 

gaaccaaacc 

aattttcttt 

aaatatgaag 

tcaagtgtta 

tcaaaattct 

tagaacatat 

aactttgaaa 

gtgttattaa 

attaattttt 

aatattcatc 

tccaaaccga 

ttgcacccct 

tatcctggaa 

agctcgatgt 

taatttctgc 

ataaagtgat 

gacttggaac 

atggcagttt 

ctattgggaa 

gcaagcggcc 

ggtcaatccc 

tgcttgagaa 

agtgaaaaca 

aagtgaaatt 

atacgtcatt 

tatttgagtc 

atatctttaa 

ggcgaattct 

gggtattttt 

cccctaaagt 

caacccaacc 

tatcaccgtg 

aaaaaaaaga 

gaggatcgcg 

tcgccactat 

accaccacct 

ccgccgccgc 

cgtctcggtc 

gcccagatcg 

ccggcccgga 

gggggagatg 

aaagggcact 
tagatctttc 
agtttttctt 
accatggctt 
gccgcggttt 
gagccacaaa 
tggttggttg 
gctaggaacg 
aggttgggcc 
tttaagtctg 
gctttgggat 
catgatgttg 



tctcaattat 

aaaatataaa 

aaaaaatatc 

tgctccattt 

ctaaaatgcg 

tatatatctt 

cattatttag 

gtgtacatca 

gaaaattctc 

actaacacat 

taacaaaaaa 

tatagttggt 

aatcataata 

attttgcaaa 

ggtggtaata 

taggaagaag 

tgaagctcga 

aaaagaaagt 

tcctttgcat 

cttcttctga 

gcttaattaa 

attgcttttg 

gagagtcggg 

tcagttaaaa 

tactcttttc 

tttgtatgaa 

gggttttaag 

aaaaacccat 

cacaatgaac 

tctagtaaaa 

tcctaaagcc 

caacccaccc 

agttgtccgc 

aaaagaaaaa 

agccagcgac 

atacataccc 

ccacctcctc 

gccggtaacc 

tcgatctttg 

gtgcgcggga 

tctcgcgggg 

atggggggtt 

atggtttata 

tttcttcttt 

ttcatgattt 

ctccggagag 

gtgatatcgt 

caccacaaga 

ctgaggttga 

cttacgattg 

taggatccac 

tggttgctgt 

acacagcccg 

gtttttggca 



atttggtcta 
tatatagttt 
tagaaatatt 
ttattaactt 
tcaatctctt 
tttcgaattt 
gtatcatatt 
acgaaaaatt 
ctataagaat 
atatttactt 
aaaaccagaa 
ttggtttgat 
gctttaatat 
atgaatcaag 
tgtaatttac 
gttagctacg 
aatatacgaa 
gatatatttt 
gtaactatta 
aaatagtggc 
atttaaatgt 
aagcagctca 
atagtccaaa 
ggtggtataa 
tactattata 
ttggttttta 
ttcgtttgct 
atgctaattt 
aataataaga 
ataaaagata 
caaagtgcta 
cagtccagcc 
acgcaccgca 
acagcaggtg 
gaggccggcc 
ccccctctcc 
ccccctcgct 
accccgcccc 
gccttggtag 
ggggcgggat 
aatggggctc 
taaaatttcc 
tttttatata 
ttgtgggtag 
gtgacaaatg 
gagaccagtt 
taaccattac 
gtggattgat 
gggtgttgtg 
gacagttgag 
attgtacaca 
tataggcctt 
gggtacattg 
aagggatttt 



atttagtttg 

ttatatatat 

tgcgactctt 

taaataattg 

tgttcttcca 

gaagtgaaat 

gatttttata 

agtcaaacga 

attttaatag 

atcaaaaatt 

aatgctgaaa 

tttgatataa 

ttcaagatat 

cctatatggc 

ttgattctaa 

atttacagca 

ggaacaaata 

ttgttcttaa 

tgctcccttc 

caccgcttaa 

ttaaactagg 

acattgatct 

ataaaacaaa 

agtaaaatat 

aaaattgagg 

agtttattcg 

tttgtaaata 

gacataattt 

ttaaaatagc 

aacttagact 

tccacgatcc 

aactggacaa 

cgtctcgcag 

ggtccgggtc 

ctccctccgc 

tcccatcccc 

gccggacgac 

tctcctcttt 

tttgggtggg 

ctcgcggctg 

tcggatgtag 

gccatgctaa 

tttctgctgc 

aatttgaatc 

cagcctcgtg 

gagattaggc 

attgagacgt 

gatctagaga 

gctggtattg 

agtactgttt 

catttgctta 

ccaaacgatc 

cgcgcagctg 

gagttgccag 



gtattgagta 

gcctttaaga 

ctggcatgta 

gttgtacgat 

tattcatatg 

ttcgataatt 

cttaattact 

ctaaaataaa 

atcatatgtt 

tgacaaagta 

acccggcaaa 

accgaaccaa 

tattaagtta 

tgtaatatga 

aaaaatatcc 

aagccagaat 

tttttaaaaa 

acaagcatcc 

gttacaaaaa 

ttaaggcgcg 

aaatccaagc 

ctttctcgag 

ggtaagatta 

cggtaataaa 

atgtttttgt 

cttttggaaa 

cagagggatt 

ttgagaaaaa 

tttcccccgt 

caaaacattt 

atagcaagcc 

tagtctccac 

ccaaaaaaaa 

gtgggggccg 

ttccaaagaa 

ccaaccctac 

gcctcccccc 

ctttctccgt 

cgagaggcgg 

gggctctcgc 

atctgcgatc 

acaagatcag 

ttcgtcaggc 

cctcagcatt 

cggagctttt 

cagctacagc 

ctacagtgaa 

ggttgcaaga 

cttacgctgg 

acgtgtcaca 

agtctatgga 

catctgttag 

gatacaagca 

ctcctccaag 



aaacaaattc 

ctttttatag 

atatttcgtt 

cactttctta 

tcaaaaccta 

taaaattaaa 

aaatttggtt 

taaatatcat 

tgtaaaaaaa 

agattaaaat 

accgaaccaa 

ctcggtccat 

acgttgtcaa 

atttaaaagc 

caagtattaa 

acaatgaacc 

aatacgcaat 

cctctaaaga 

ttttggacta 

ccatgcccgg 

ttgggctgca 

gtcattcata 

cctggtcaaa 

aggtggccca 

cggtactttg 

tgcatatctg 

tgtataagaa 

tatatattca 

tgcagcgcat 

acaaaaacaa 

cagcccaacc 

acccccccac 

aaaaagaaag 

gaaacgcgag 

acgcccccca 

caccaccacc 

tccccctccg 

tttttttttc 

cttcgtgcgc 

cggcgtggat 

cgccgttgtt 

gaagagggga 

ttagatgtgc 

gttcatcggt 

ttgtaggtag 

agctgatatg 

ctttaggaca 

tagataccct 

gccctggaag 

taggcatcaa 

ggcgcaaggt 

gttgcatgag 

tggtggatgg 

gccagttagg 



120 
160 
2MD 
300 
3b0 
M20 
MAO 

5^u 

bOO 
bbO 
720 
7AD 
AMD 
"100 
■?bO 
1020 
IDAO 

imo 

1200 
12b0 
1320 
13AD 
IMHO 
1500 
ISbO 
lb20 
IbAO 
17M0 
lAOO 
lAbO 
n20 

20MQ 
21DQ 
Elba 
2220 
22AD 
23M0 
2MDD 
2MbD 
2520 
25AQ 
2bM0 
2700 
27bD 
2A2D 
2AA0 
2TMD 
3000 
3QbO 
3120 
31AD 
32H0 
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ccagttaccc 
gttctggccg 
tctggtttgc 
gtgctttaaa 
gtatgggctt 
gcagttgcgt 
ttgctttctt 
ttgccgtggc 
ggagcataat 
gaacaaaaaa 
tgttccttcg 
gtaaatcgta 
aatcaagtaa 
tataggcttg 
cttgaaatat 
atcaaaatca 
agcattttct 
tttgataagt 
taaaatattc 
ttgactaatt 
aaaatcaata 
cacttcaaat 
agaacaaaga 
tatttaaagt 
gtcgcaaata 
atataaaaac 
ctaaattaga 
ttggcgtaat 
cacaacatac 
ctcacattaa 
ctgcattaat 
gctgcgcacg 
gttcggctgc 
tcaggggata 
aaaaaggccg 
aatcgacgct 
ccccctggaa 
tccgcctttc 
agttcggtgt 
gaccgctgcg 
tcgccactgg 
acagagttct 
tgcgctctgc 
caaaccaccg 
aaaggatctc 
aactcacgtt 
ttaaattaaa 
agttaccaat 
atagttgcct 
cccagtgctg 
aaccagccag 
cagtctatta 
aacgttgttg 
ttcagctccg 
gcggttagct 
ctcatggtta 
tctgtgactg 
tgctcttgcc 
ctcatcattg 
tccagttcga 
agcgtttctg 
acacggaaat 
ggttattgtc 
gttccgcgca 
acattaacct 
gacggtgaaa 
gatgccggga 
tggcttaact 
ataccgcaca 
cgcaactgtt 

gggggatgtg 



agatctgagg 
gccgggcctt 
tttaatttta 
gacccaccgg 
tatttgcttc 
attgtaattc 
ctaaagatct 
ctattttcag 
agttacatgc 
tatatcactt 
tatatttcga 
gctaaccttc 
attacatatt 
attcattttg 
taaagctatt 
aaccaaacca 
ggtttttttt 
aaatatatgt 
ttataggaga 
tttcgttgat 
tgatacctaa 
tcgaaaaaga 
gattgacgca 
taataaaaat 
tttctagata 
tatatattta 
ccaaatataa 
catggtcata 
gagccggaag 
ttgcgttgcg 
gaatcggcca 
ctgcgcacgc 
ggcgagcggt 
acgcaggaaa 
cgttgctggc 
caagtcagag 
gctccctcgt 
tcccttcggg 
aggtcgttcg 
ccttatccgg 
cagcagccac 
tgaagtggtg 
tgaagccagt 
ctggtagcgg 
aagaagatcc 
aagggatttt 
aatgaagttt 
gcttaatcag 
gactccccgt 
caatgatacc 
ccggaagggc 
attgttgccg 
ccattgctac 
gttcccaacg 
ccttcggtcc 
tggcagcact 
gtgagtactc 
cggcgtcaat 
gaaaacgttc 
tgtaacccac 
ggtgagcaaa 
gttgaatact 
tcatgagcgg 
catttccccg 
ataaaaatag 
acctctgaca 
gcagacaagc 
atgcggcatc 
gatgcgtaag 
gggaagggcg 
ctgcaaggcg 



taccctgagc 
gggcgcgcga 
ccaagtttgt 
cactggcagt 
tggatgttgt 
agctgggcta 
ttaagtgctg 
aagaagttcc 
aaaggaaaac 
tcttttgttc 
gcttcaatca 
ttcctagcag 
accaccacat 
caaaatttcc 
atgattaggg 
actatatcgg 
ttgttagatg 
gttagtaaaa 
attttcttaa 
gtacactttc 
ataatgatat 
tatataagaa 
ttttagtaac 
ggagcacttc 
ttttttaaag 
tattttggtt 
ttgggatttt 
gctgtttcct 
cataaagtgt 
ctcactgccc 
acgcgcgggg 
tgcgcacgct 
atcagctcac 
gaacatgtga 
gtttttccat 
gtggcgaaac 
gcgctctcct 
aagcgtggcg 
ctccaagctg 
taactatcgt 
tggtaacagg 
gcctaactac 
taccttcgga 



tggttttttt 
tttgatcttt 
ggtcatgaga 
taaatcaatc 
tgaggcacct 
cgtgtagata 
gcgagaccca 
cgagcgcaga 
ggaagctaga 
aggcatcgtg 
atcaaggcga 
tccgatcgtt 
gcataattct 
aaccaagtca 
acgggataat 
ttcggggcga 
tcgtgcaccc 
aacaggaagg 
catactcttc 
atacatattt 
aaaagtgcca 
gcgtatcacg 
catgcagctc 
ccgtcagggc 
agagcagatt 
gagaaaatac 
atcggtgcgg 
attaagttgg 



tcggtcgcag 
tcagaagcgt 
ttcaaggtgg 
gagtgttgct 
gtactacttg 
cctggacatt 
aattcatatt 
caatagtagt 
tgccattctt 
caagtcattg 
ctttatggtt 
aaattattaa 
cgagctgctt 
aggatattga 
gtgcaaatgg 
tttggattgg 
aatattattt 
attaattttt 
taacacatga 
aaagttaacc 
gttctattta 
ttttgataga 
acttgataag 
atatttaacg 
aaaattctat 
tggttcgaat 
taatcgcggc 
gtgtgaaatt 
aaagcctggg 
gctttccagt 
agaggcggtt 
tcctcgctca 
tcaaaggcgg 
gcaaaaggcc 
aggctccgcc 
ccgacaggac 
gttccgaccc 
ctttctcata 
ggctgtgtgc 
cttgagtcca 
attagcagag 
ggctacacta 
aaaagagttg 
gtttgcaagc 
tctacggggt 
ttatcaaaaa 
taaagtatat 
atctcagcga 
actacgatac 
cgctcaccgg 
agtggtcctg 
gtaagtagtt 
gtgtcacgct 
gttacatgat 
gtcagaagta 
cttactgtca 
ttctgagaat 
accgcgccac 
aaactctcaa 
aactgatctt 
caaaatgccg 
ctttttcaat 
gaatgtattt 
cctgacgtct 
aggccctttc 
ccggagacgg 
gcgtcagcgg 
gtactgagag 
cgcatcaggc 
gcctcttcgc 
gtaacgccag 



cgtgtgcgtg 
tgcgttggcg 
atcgcgtggt 
gcttgtgtag 
ggtttgttga 
gttatgtatt 
tcctcctgca 
ccaaaat'ttt 
tagaggggat 
cgtatttttt 
ctttgtattc 
tacttgggat 
ttaaattcat 
caacgttaac 
accgagttgg 
ttcggttttg 
taatcttact 
tttacaaaca 
tatttattta 
aaatttagta 
attttaaatt 
ttttgacata 
aaagtgatcg 
aaatattaca 
aaaaagtctt 
ttgttttact 
ccactagtca 
gttatccgct 
gtgcctaatg 
cgggaaacct 
tgcgtattgg 
ctgactcgct 
taatacggtt 
agcaaaaggc 
cccctgacga 
tataaagata 
tgccgcttac 
gctcacgctg 
acgaaccccc 
acccggtaag 
cgaggtatgt 
gaaggacagt 
gtagctcttg 
agcagattac 
ctgacgctca 
ggatcttcac 
atgagtaaac 
tctgtctatt 
gggagggctt 
ctccagattt 
caactttatc 
cgccagttaa 
cgtcgtttgg 
cccccatgtt 
agttggccgc 
tgccatccgt 
agtgtatgcg 
atagcagaac 
ggatcttacc 
cagcatcttt 
caaaaaaggg 
attattgaag 
agaaaaataa 
aagaaaccat 
gtctcgcgcg 
tcacagcttg 
gtgttggcgg 
tgcaccatat 
gccattcgcc 
tattacgcca 
ggttttccca 



tccgtcgtac 
tgtgtgtgct 
caaggcccgt 
gctttggtac 
attattatga 
aataaatgct 
gggtttaaac 
tgtaacgaag 
gcttgtttaa 
taaaaatatt 
tggctttgct 
atttttttag 
attacagcca 
ttaataatat 
ttcggtttat 
ccgggttttc 
ttgtcaaatt 
tatgatctat 
ttttagtcgt 
attaagtata 
atcgaaattt 
tgaatatgga 
tacaaccaat 
tgccagaaga 
aaaggcatat 
caataccaaa 
ccggtgtagc 
cacaattcca 
agtgagctaa 
gtcgtgccag 
gcgctcttcc 
gcgctcggtc 
atccacagaa 
caggaaccgt 
gcatcacaaa 
ccaggcgttt 
cggatacctg 
taggtatctc 
cgttcagccc 
acacgactta 
aggcggtgct 
atttggtatc 
atccggcaaa 
gcgcagaaaa 
gtggaacgaa 
ctagatcctt 
ttggtctgac 
tcgttcatcc 
accatctggc 
atcagcaata 
cgcctccatc 
tagtttgcgc 
tatggcttca 
gtgcaaaaaa 
agtgttatca 
aagatgcttt 
gcgaccgagt 
tttaaaagtg 
gctgttgaga 
tactttcacc 
aataagggcg 
catttatcag 
acaaataggg 
tattatcatg 
tttcggtgat 
tctgtaagcg 
gtgtcggggc 
gcggtgtgaa 
attcaggctg 
gctggcgaaa 
gtcacgacgt 



3300 
331>D 
3^^D 

3Mao 

35M0 
3t00 
3lib0 
37B0 

37ao 

3&M0 
3^)00 
3')b0 
HOSO 

Hoao 

MSOO 
MStO 
M3S0 

M3ao 

HMMD 
H500 
M5b0 
MbBO 

Mbao 

M7M0 
HAOD 

uabo 
M<?ao 

5040 
5100 
5]>b0 
5550 

55ao 

53M0 
5M00 
5MbO 
5520 

55ao 

5bM0 
5700 
57bD 

saso 
5aao 

SSMO 
bOOO 
bObO 
bl20 

bxao 

bEHO 
b300 
b3bO 
bM20 

bMao 

b5M0 
bbOO 
bbbO 
b7S0 

b7ao 
baMD 

blOO 
b'IbO 
7020 

7Dao 
7mo 

7200 
72b0 
7320 
7360 
7^M□ 
7500 
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tgtaaaacga cggccagtga attacaccgg tgtgatcatg ggccg 



7SMS 



<B1Q> 11 

<21X> llbM3 

<212> DNA 

<213> Artificial Sequence 



<HED> 

<B23> Synthetic 



<^DD> 11 
cgattaaaaa 
aaccaaacca 
attttcttta 
aatatgaagt 
caagtgttac 
caaaattctt 
agaacatatc 
actttgaaag 
tgttattaag 
ttaattttta 
atattcatct 
ccaaaccgat 
tgcaccccta 
atcctggaaa 
gctcgatgtg 
aatttctgct 
taaagtgatt 
acttggaaca 
tggcagtttt 
tattgggaat 
cagatccccg 
ctctagagat 
tcacacttgt 
gaataatata 
cagttagaca 
ctttttagtg 
catccatttt 
tttttagtac 
tttagttttt 
taaacaaata 
gataatgcca 
cagcgtcgcg 
ccctctcgag 
gtggcggagc 
ggcagctacg 
ataaatagac 
acacacaacc 
tcgtcctccc 
ttagggcccg 
ccgtgctgct 
acttgccagt 
tcgatttcat 
tcaatatatg 
tgtgatgatg 
tacctggtgg 
gaattgaaga 
ttactgatgc 
gggcggtcgt 
ttattaattt 
gatggaaata 
catgatggca 
taaacaagta 
cagctatatg 
ttcttttgtc 
gaccatggtg 
catctccttc 
ctccctcagc 
ggccagcccc 
cctgctgcag 



cccaattata 
aaatataaat 
aaaaatatct 
gctccatttt 
taaaatgcgt 
atatatcttt 
attatttagg 
tgtacatcaa 
aaaattctcc 
ctaacacata 
aacaaaaaaa 
atagttggtt 
atcataatag 
ttttgcaaaa 
gtggtaatat 
aggaagaagg 
gaagctcgaa 
aaagaaagtg 
cctttgcatg 
tcttctgaaa 
gggatcctct 
aatgagcatt 
ttgaagtgca 
atctatagta 
tggtctaaag 
tgcatgtgtt 
attagtacat 
atctatttta 
ttatttaata 
ccctttaaga 
gcctgttaaa 
tcgggccaag 
agttccgctc 
ggcagacgtg 
ggggattcct 
accccctcca 
agatctcccc 
cccccccccc 
gtagttctac 
agcgttcgta 
gtttctcttt 
gatttttttt 
ccgtgcactt 
tggtctggtt 
atttattaat 
tgatggatgg 
atatacagag 
tcattcgttc 
tggaactgta 
tcgatctagg 
tatgcagcat 
tgttttataa 
tggatttttt 
gatgctcacc 
atgagacgct 
ctgcacttct 
cctaacctgg 
gagccaggag 
ccgctgccgc 



tttggtctaa 
atatagtttt 
agaaatattt 
tattaacttt 
caatctcttt 
ttcgaatttg 
tatcatattg 
cgaaaaatta 
tataagaata 
tatttactta 
aaaccagaaa 
tggtttgatt 
ctttaatatt 
tgaatcaagc 
gtaatttact 
ttagctacga 
atatacgaag 
atatattttt 
taactattat 
atagtggcca 
agagtcgacc 
gcatgtctaa 
gtttatctat 
ctacaataat 
gacaattgag 
ctcctttttt 
ccatttaggg 
ttctatttta 
atttagatat 
aattaaaaaa 
cgccgtcgac 
cgaagcagac 
caccgttgga 
agccggcacg 
ttcccaccgc 
caccctcttt 
caaatccacc 
tctctacctt 
ttctgttcat 
cacggatgcg 
ggggaatcct 
gtttcgttgc 
gtttgtcggg 

gggcggtcgt 

tttggatctg 
aaatatcgat 
atgctttttg 
tagatcggag 
tgtgtgtgtc 
ataggtatac 
ctattcatat 
ttattttgat 
tagccctgcc 
ctgttgtttg 
acaagctctt 
tcaagaccct 
tgtccagctt 
gccctgacct 
ccagcaaggc 



tttagtttgg 
tatatatatg 
gcgactcttc 
aaataattgg 
gttcttccat 
aagtgaaatt 
atttttatac 
gtcaaacgac 
ttttaataga 
tcaaaaattt 
atgctgaaaa 
ttgatataaa 
tcaagatatt 
ctatatggct 
tgattctaaa 
tttacagcaa 
gaacaaatat 
tgttcttaaa 
gctcccttcg 
ccgcttaatt 
tgcagtgcag 
gttataaaaa 
ctttatacat 
atcagtgttt 
tattttgaca 
tttgcaaata 
tttagggtta 
gcctctaaat 
aaaatagaat 
actaaggaaa 
gagtctaacg 
ggcacggcat 
cttgctccgc 
gcaggcggcc 
tccttcgctt 
ccccaacctc 
cgtcggcacc 
ctctagatcg 
gtttgtgtta 
acctgtacgt 
gggatggctc 
atagggtttg 
tcatcttttc 
tctagatcgg 
tatgtgtgtg 
ctaggatagg 
ttcgcttggt 
tagaatactg 
atacatcttc 
atgttgatgt 
gctctaacct 
cttgatatac 
ttcatacgct 
gtgttacttc 
tctcatgttc 
gtcctatgtc 
tttctggaac 
gctgcgtacc 
ggccgaggag 



tattgagtaa 
cctttaagac 
tggcatgtaa 
ttgtacgatc 
attcatatgt 
tcgataattt 
ttaattacta 
taaaataaat 
tcatatgttt 
gacaaagtaa 
cccggcaaaa 
ccgaaccaac 
attaagttaa 
gtaatatgaa 
aaaatatccc 
agccagaata 
ttttaaaaaa 
caagcatccc 
ttacaaaaat 
aaggcgcgcc 
cgtgacccgg 
attaccacat 
atatttaaac 
tagagaatca 
acaggactct 
gcttcaccta 
atggttttta 
taagaaaact 
aaaataaagt 
catttttctt 
gacaccaacc 
ctctgtcgct 
tgtcggcatc 
tcctcctcct 
tcccttcctc 
gtgttgttcg 
tccgcttcaa 
gcgttccggt 
gatccgtgtt 
cagacacgtt 
tagccgttcc 
gtttgccctt 
atgctttttt 
agtagaattc 
ccatacatat 
tatacatgtt 
tgtgatgatg 
tttcaaacta 
atagttacga 
gggttttact 
tgagtaccta 
ttggatgatg 
atttatttgc 
tgcagggtac 
tgtatggccg 
accttccccc 
aatgccccgg 
ccactctact 
ctccaccggg 



aacaaattcg 
tttttataga 
tatttcgtta 
actttcttat 
caaaatctat 
aaaattaaat 
aatttggtta 
aaatatcatg 
gtaaaaaaaa 
gattaaaata 
ccgaaccaat 
tcggtccatt 
cgttgtcaat 
tttaaaagca 
aagtattaat 
caaagaacca 
atacgcaatg 
ctctaaagaa 
tttggactac 
atgccccctg 
tcgtgcccct 
attttttttg 
tttaatctac 
tataaatgaa 
acagttttat 
tataatactt 
tagactaatt 
aaaactctat 
gactaaaaat 
gtttcgagta 
agcgaaccag 
gcctctggac 
cagaaattgc 
ctcacggcac 
gcccgccgta 
gagcgcacac 
ggtacgccgc 
ccatgcatgg 
tgtgttagat 
ctgattgcta 
gcagacggga 
ttcctttatt 
ttgtcttggt 
tgtttcaaac 
tcatagttac 
gatgcgggtt 
tggtgtggtt 
cctggtgtat 
gtttaagatg 
gatgcatata 
tctattataa 
gcatatgcag 
ttggtactgt 
ccccggggtc 
gcctgtgcct 
gagaactggc 
tcacgcccca 
cccactcgcc 
tggacttggt 



lED 
IAD 
SMO 
300 
3b0 

^^o 

MAD 

s^Q 

bOD 
bbO 
720 
7A0 
AMD 
■^00 
<1bD 
1020 
lOAO 
HMD 
1200 
12bQ 
1320 
1360 
mMO 
1500 
15b0 
lb20 
IbAD 
17MD 
lAOO 
lAbO 

n20 

I'lAO 
HOMO 
2100 
21bO 
2220 
22A0 
23MD 
2MD0 
2MbO 
2520 
25A0 
2bMD 
2700 
27b0 
2A20 
2AAQ 
2^MD 
300D 
3DbO 
3120 
3160 
32MD 
3300 
33L0 
3M20 
3MAD 
35MD 
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gctgcccgag 
acccggcacc 
ggccaacggc 
ggggggccga 
acccagctgc 
ggtgcccagg 
cctgctggac 
gtccaacttc 
tggcaccttc 
cggcggccgg 
cggcgtctcg 
cgagatcccg 
cttcgccttc 
ggaggtggtg 
ccgcctgcgc 
cggccacatc 
ctcctggtgc 
cttcccacgc 
ccgcaccggg 
gcacatgtat 
caacccctac 
gggaaggccg 
agaagaggat 
gctgaattgt 
tttatcatat 
ataactcaat 
gtgatgtttt 
ctgtttcttg 
tgtgctttgt 
ggtggccaga 
aaatccaagc 
ctttctcgag 
ggtaagatta 
cggtaataaa 
atgtttttgt 
cttttggaaa 
cagagggatt 
ttgagaaaaa 
tttcccccgt 
caaaacattt 
atagcaagcc 
tagtctccac 
ccaaaaaaaa 
gtgggggccg 
ttccaaagaa 
ccaaccctac 
gcctcccccc 
ctttctccgt 
cgagaggcgg 
gggctctcgc 
atctgcgatc 
acaagatcag 
ttcgtcaggc 
cctcagcatt 
cggagctttt 
cagctacagc 
ctacagtgaa 
ggttgcaaga 
cttacgctgg 
acgtgtcaca 
agtctatgga 
catctgttag 
gatacaagca 
ctcctccaag 
cgtgtgcgtg 
tgcgttggcg 
atcgcgtggt 
gcttgtgtag 
ggtttgttga 
gttatgtatt 
tcctcctgca 



gacaccaccg 
aagatgctgg 
tcctcggccc 
ggcgcccggc 
ggcgtgccca 
gaggtgccgc 
gtgcgcttcc 
acggcttatg 
gagtacatcc 
caggacggct 
cggctgcgca 
gcccgtgacg 
cacatgcgca 
tcaggctgca 
cgccgccagt 
ctggtgcagt 
ttcacgcccg 
tggggtgact 
ggctggttcg 
gcgcccaagt 
caggagccca 
cccgcccggg 
ctgaattagg 
ttgtgatgtt 
gatcgatctt 
gtataagatg 
gactccaaaa 
aagagttcat 
aacataacaa 
tccactaggg 
ttgggctgca 
gtcattcata 
cctggtcaaa 
aggtggccca 
cggtactttg 
tgcatatctg 
tgtataagaa 
tatatattca 
tgcagcgcat 
acaaaaacaa 
cagcccaacc 
acccccccac 
aaaaagaaag 
gaaacgcgag 
acgcccccca 
caccaccacc 
tccccctccg 
tttttttttc 
cttcgtgcgc 
cggcgtggat 
cgccgttgtt 
gaagagggga 
ttagatgtgc 
gttcatcggt 
ttgtaggtag 
agctgatatg 
ctttaggaca 
tagataccct 
gccctggaag 
taggcatcaa 
ggcgcaaggt 
gttgcatgag 
tggtggatgg 
gccagttagg 
tccgtcgtac 
tgtgtgtgct 
caaggcccgt 
gctttggtac 
attattatga 
aataaatgct 
gggtttaaac 



agtatttcgt 
agaggccccc 
ggcggccacc 
gcaagtgggt 
ctgtggtgca 
gccgcgtcat 
acgagctggg 
gggagccgcg 
gccacaaggt 
ggatcgccga 
acctgcggcc 
gcgtcctttt 
agtcgctcta 
cggtggacat 
actacaccat 
ggtcgctggg 
agggcatcta 
acgaggacaa 
acggcacgca 
acctgctgaa 
ggagcacggc 
gcaaactgga 
atcctaggtt 
ggtggcgtat 
tggggtttta 
tgttcattct 
accaaaatca 
ctaccattcc 
ttgttcacgg 
gcaagcggcc 
ggtcaatccc 
tgcttgagaa 
agtgaaaaca 
aagtgaaatt 
atacgtcatt 
tatttgagtc 
atatctttaa 
ggcgaattct 
gggtattttt 
cccctaaagt 
caacccaacc 
tatcaccgtg 
aaaaaaaaga 
gaggatcgcg 
tcgccactat 
accaccacct 
ccgccgccgc 
cgtctcggtc 
gcccagatcg 
ccggcccgga 
gggggagatg 
aaagggcact 
tagatctttc 
agtttttctt 
accatggctt 
gccgcggttt 
gagccacaaa 
tggttggttg 
gctaggaacg 
aggttgggcc 
tttaagtctg 
gctttgggat 
catgatgttg 
ccagt.taccc 
gttctggccg 
tctggtttgc 
gtgctttaaa 
gtatgggctt 
gcagttgcgt 
ttgctttctt 
ttgccgtggc 



gcgcaccaag 
cccgggacgg 
ccggtacctc 
ggagtgcgtg 
gtactccaac 
caacgccatc 
cgacgtggtg 
gccgctca^g 
gctctatgtc 
cgactacctg 
cgacgacgtc 
cctcaagctc 
cggcttcttc 
gctgcaggca 
gcccaacttc 
cagccccctg 
cttcaagctc 
gcgggacctg 
gcaggagtac 
gaactacgac 
ggcgggcggg 
cgaggcggaa 
taaactgagg 
tttgtttaaa 
tttaacacat 
tcggttgcca 
caactcaata 
agttggcatt 
catatatcca 
gcttaattaa 
attgcttttg 
gagagtcggg 
tcagttaaaa 
tactcttttc 
tttgtatgaa 
gggttttaag 
aaaaacccat 
cacaatgaac 
tctagtaaaa 
tcctaaagcc 
caacccaccc 
agttgtccgc 
aaaagaaaaa 
agccagcgac 
atacataccc 
ccacctcctc 
gccggtaacc 
tcgatctttg 
gtgcgcggga 
tctcgcgggg 
atggggggtt 
atggtttata 
tttcttcttt 
ttcatgattt 
ctccggagag 
gtgatatcgt 
caccacaaga 
ctgaggttga 
cttacgattg 
taggatccac 
tggttgctgt 
acacagcccg 
gtttttggca 
agatctgagg 
gccgggcctt 
tttaatttta 
gacccaccgg 
tatttgcttc 
attgtaattc 
ctaaagatct 
ctattttcag 



gccggcggcg 
ccggaggaga 
ctgagcgccc 
tgcctgcccg 
ctgcccacca 
aacgtcaacc 
gacgcctttg 
ttccgggaga 
ttcctggacc 
cgcaccttcc 
ttcatcattg 
tacgatggct 
tggaagcagc 
gtgtatgggc 
agacagtatg 
cacttcgccg 
gtgtccgccc 
aactacatcc 
ccgcctgcag 
cggttccact 
tggcgccaca 
gtcgaacaaa 
gcactgaagt 
taagtaagca 
tgtaaaatgt 
tagatctgct 
aactcatgga 
tatcagtgtt 
cggccggcct 
atttaaatgt 
aagcagctca 
atagtccaaa 
ggtggtataa 
tactattata 
ttggttttta 
ttcgtttgct 
atgctaattt 
aataataaga 
ataaaagata 
caaagtgcta 
cagtccagcc 
acgcaccgca 
acagcaggtg 
gaggccggcc 
ccccctctcc 
ccccctcgct 
accccgcccc 
gccttggtag 
ggggcgggat 
aatggggctc 
taaaatttcc 
tttttatata 
ttgtgggtag 
gtgacaaatg 
gagaccagtt 
taaccattac 
gtggattgat 
gggtgttgtg 
gacagttgag 
attgtacaca 
tataggcctt 
gggtacattg 
aagggatttt 
taccctgagc 
gggcgcgcga 
ccaagtttgt 
cactggcagt 
tggatgttgt 
agctgggcta 
ttaagtgctg 
aagaattccc 



tctgcttcaa 
agcctgaggg 
gggagcgcac 
gctggcacgg 
aggagcggct 
acgagttcga 
tggtgtgcga 
tgctgaccaa 
acttcccgcc 
tcacccagga 
acgatgcgga 
ggaccgagcc 
cgggcaccct 
tggacggcat 
agaaccgcac 
gctggcactg 
agaatggcga 
gcggcctgat 
accccagcga 
acctgctgga 
ggggtcccga 
aactcatctc 
cgcttgatgt 
tggctgtgat 
gtatctatta 
tatttgacct 
atatgtccac 
gcagcggcgc 
agctagccac 
ttaaactagg 
acattgatct 
ataaaacaaa 
agtaaaatat 
aaaattgagg 
agtttattcg 
tttgtaaata 
gacataattt 
ttaaaatagc 
aacttagact 
tccacgatcc 
aactggacaa 
cgtctcgcag 
ggtccgggtc 
ctccctccgc 
tcccatcccc 
gccggacgac 
tctcctcttt 
tttgggtggg 
ctcgcggctg 
tcggatgtag 
gccatgctaa 
tttctgctgc 
aatttgaatc 
cagcctcgtg 
gagattaggc 
attgagacgt 
gatctagaga 
gctggtattg 
agtactgttt 
catttgctta 
ccaaacgatc 
cgcgcagctg 
gagttgccag 
tcggtcgcag 
tcagaagcgt 
ttcaaggtgg 
gagtgttgct 
gtactacttg 
cctggacatt 
aattcatatt 
aatagtagtc 



3bOD 

3720 
3760 
3AM0 
3^00 
311)0 
MOSO 
MDfiO 

Nmo 

H2UU 

^3^o 

M350 
MMMO 
M50Q 
HSbO 

Mb60 
M7Ma 
HAOO 
MflbO 
M1B0 

H^ao 

50MD 
5X00 
5].tQ 
52BD 
SBfiO 
S3MD 
5MQ0 
SHbO 
5550 

55ao 

5bM0 
5700 
57b0 
5flSQ 
5A&0 
51M0 
bOOO 
bOtO 
blSO 
blAO 
bSMO 
b300 
b3tiG 
bM2D 
bMAO 
b5MQ 
bbOO 
bbbO 
b72D 
b7aD 

ba^D 
b'loa 

7020 
70flD 

7mo 

7200 
72b0 
7320 

73ao 

74M0 
7SDD 
75b0 
7b20 
7bA0 
77M0 

7aoo 
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caaaattttt gtaacgaagg gagcataata gttacatgca aaggaaaact gccattcttt 7flbD 

agaggggatg cttgtttaag aacaaaaaat atatcacttt cttttgttcc aagtcattgc 7120 

gtattttttt aaaaatattt gttccttcgt atatttcgag cttcaatcac tttatggttc 71flD 

tttgtattct ggctttgctg taaatcgtag ctaaccttct tcctagcaga aattattaat flOMD 

acttgggata tttttttaga atcaagtaaa ttacatatta ccaccacatc gagctgcttt filOO 

taaattcata ttacagccat ataggcttga ttcattttgc aaaatttcca ggatattgac AlbO 

aacgttaact taataatatc ttgaaatatt aaagctatta tgattagggg tgcaaatgga A22D 

ccgagttggt tcggtttata tcaaaatcaa- accaaaccaa ctatatcggt ttggattggt AHAO 

tcggttttgc cgggttttca gcattttctg gttttttttt tgttagatga atattatttt A3^0 

aatcttactt tgtcaaattt ttgataagta aatatatgtg ttagtaaaaa ttaatttttt AMOO 

ttacaaacat atgatctatt aaaatattct tataggagaa ttttcttaat aacacatgat AMtD 

atttatttat tttagtcgtt tgactaattt ttcgttgatg tacactttca aagttaacca ASHO 

aatttagtaa ttaagtataa aaatcaatat gatacctaaa taatgatatg ttctatttaa ASAD 

ttttaaatta tcgaaatttc acttcaaatt cgaaaaagat atataagaat tttgatagat AbMO 

tttgacatat gaatatggaa gaacaaagag attgacgcat tttagtaaca cttgataaga A7D0 

aagtgatcgt acaaccaatt atttaaagtt aataaaaatg gagcacttca tatttaacga A7hD 

aatattacat gccagaagag tcgcaaatat ttctagatat tttttaaaga aaattctata AA2D 

aaaagtctta aaggcatata tataaaaact atatatttat attttggttt ggttcgaatt AAAO 

tgttttactc aataccaaac taaattagac caaatataat tgggttttta atcgcggccc A*iMO 

actagtcacc ggtgtagctt ggcgtaatca tggtcatagc tgtttcctgt gtgaaattgt ^000 

tatccgctca caattccaca caacatacga gccggaagca taaagtgtaa agcctggggt ^ObO 

gcctaatgag tgagctaact cacattaatt gcgttgcgct cactgcccgc tttccagtcg "1120 

ggaaacctgt cgtgccagct gcattaatga atcggccaac gcgcggggag aggcggtttg "llAD 

cgtattgggc gctcttccgc tgcgcacgct gcgcacgctg cgcacgcttc ctcgctcact 'J2HD 

gactcgctgc gctcggtcgt tcggctgcgg cgagcggtat cagctcactc aaaggcggta '\3QU 

atacggttat ccacagaatc aggggataac gcaggaaaga acatgtgagc aaaaggccag ■IBtD 

caaaaggcca ggaaccgtaa aaaggccgcg ttgctggcgt ttttccatag gctccgcccc ^20 

cctgacgagc atcacaaaaa tcgacgctca agtcagaggt ggcgaaaccc gacaggacta IMAD 

taaagatacc aggcgtttcc ccctggaagc tccctcgtgc gctctcctgt tccgaccctg •iSMO 

ccgcttaccg gatacctgtc cgcctttctc ccttcgggaa gcgtggcgct ttctcatagc IbDD 

tcacgctgta ggtatctcag ttcggtgtag gtcgttcgct ccaagctggg ctgtgtgcac ^btO 

gaaccccccg ttcagcccga ccgctgcgcc ttatccggta actatcgtct tgagtccaac ^720 

ccggtaagac acgacttatc gccactggca gcagccactg gtaacaggat tagcagagcg ^7A0 

aggtatgtag gcggtgctac agagttcttg aagtggtggc ctaactacgg ctacactaga lAHD 
aggacagtat ttggtatctg cgctctgctg aagccagtta ccttcggaaa aagagttggt 

agctcttgat ccggcaaaca aaccaccgct ggtagcggtg gtttttttgt ttgcaagcag ^bO 

cagattacgc gcagaaaaaa aggatctcaa gaagatcctt tgatcttttc tacggggtct 1D02Q 

gacgctcagt ggaacgaaaa ctcacgttaa gggattttgg tcatgagatt atcaaaaagg lOOAQ 

atcttcacct agatcctttt aaattaaaaa tgaagtttta aatcaatcta aagtatatat lOlMO 

gagtaaactt ggtctgacag ttaccaatgc ttaatcagtg aggcacctat ctcagcgatc 1D20D 

tgtctatttc gttcatccat agttgcctga ctccccgtcg tgtagataac tacgatacgg lQ2bO 

gagggcttac catctggccc cagtgctgca atgataccgc gagacccacg ctcaccggct 10320 

ccagatttat cagcaataaa ccagccagcc ggaagggccg agcgcagaag tggtcctgca 103A0 

actttatccg cctccatcca gtctattaat tgttgccggg aagctagagt aagtagttcg lOMHD 

ccagttaata gtttgcgcaa cgttgttgcc attgctacag gcatcgtggt gtcacgctcg 10500 

tcgtttggta tggcttcatt cagctccggt tcccaacgat caaggcgagt tacatgatcc lOSbO 

cccatgttgt gcaaaaaagc ggttagctcc ttcggtcctc cgatcgttgt cagaagtaag 10b2a 

ttggccgcag tgttatcact catggttatg gcagcactgc ataattctct tactgtcatg lObAO 

ccatccgtaa gatgcttttc tgtgactggt gagtactcaa ccaagtcatt ctgagaatag 107MD 

tgtatgcggc gaccgagttg ctcttgcccg gcgtcaatac gggataatac cgcgccacat IQAOO 

agcagaactt taaaagtgct catcattgga aaacgttctt cggggcgaaa actctcaagg lOAbO 

atcttaccgc tgttgagatc cagttcgatg taacccactc gtgcacccaa ctgatcttca 10120 

gcatctttta ctttcaccag cgtttctggg tgagcaaaaa caggaaggca aaatgccgca lOiAD 

aaaaagggaa taagggcgac acggaaatgt tgaatactca tactcttcct ttttcaatat 110>40 

tattgaagca tttatcaggg ttattgtctc atgagcggat acatatttga atgtatttag 11100 

aaaaataaac aaataggggt tccgcgcaca tttccccgaa aagtgccacc tgacgtctaa lllbO 

gaaaccatta ttatcatgac attaacctat aaaaataggc gtatcacgag gccctttcgt 11220 

ctcgcgcgtt tcggtgatga cggtgaaaac ctctgacaca tgcagctccc ggagacggtc 11260 

acagcttgtc tgtaagcgga tgccgggagc agacaagccc gtcagggcgc gtcagcgggt 113^0 

gttggcgggt gtcggggctg gcttaactat gcggcatcag agcagattgt actgagagtg imOD 

caccatatgc ggtgtgaaat accgcacaga tgcgtaagga gaaaataccg catcaggcgc llMbO 

cattcgccat tcaggctgcg caactgttgg gaagggcgat cggtgcgggc ctcttcgcta 11520 

ttacgccagc tggcgaaagg gggatgtgct gcaaggcgat taagttgggt aacgccaggg 115A0 

ttttcccagt cacgacgttg taaaacgacg gccagtgaat tacaccggtg tgatcatggg llbMO 



<2I0> 12 

<211> 115 

<212> ]>NA 

<213> Artificial Sequence 
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<SHO> 

<2Sa> Synthetic 
<MDO> XH 

catgattacg ccaagctagc ggccgcattc ccgggaagct aggccaccgt ggcccgcctg bO 
caggggaagc ttgcatgcct gcagatcccc ggggatcctc tagagtcgac ctgca 115 



<Bia> 13 

<211> n 

<S12> DMA 

<2ia> Artificial Sequence 



<22D> 

<22a> Synthetic 
<HDD> 13 

gggtaccccc ggggtcgac 



<21D> m 

<211> 17 

<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Synthetic 
<MDO> m 

taatgagctc gtttaaa 17 



<E1D> 15 

<211> 55 

<212> ^NA 

<213> Artificial Sequence 
<22Q> 

<223> Synthetic 

<MOa> 15 

cggccggcct agctagccac ggtggccaga tccactagtt ctagagcggc cgctt 55 



<21Q> lb 

<211> 3a 

<212> ]>NA 

<213> Artificial Sequence 
<22D> 

<223> Synthetic 

<^Qa> lb 

cctgcagatc cccggggatc ctctagagtc gacctgca 3A 



<21D> 17 

<211> n 

<212> 9NA 

<213> Artificial Sequence 
<220> 

<223> Synthetic 

<^oo> 17 

gggtaccccc ggggtcgac IT 



<21D> Ifi 
<211> 13A 
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<B1B> DMA 

<S13> Artificial Sequence 
<25D> 

<SE3> Synthetic 
<MaD> lA 

tggccaccgc ttaattaagg cgcgccatgc ccgggcaagc g^^ccgcttaa ttaaatttaa bO 
atgtttaaac taggaaatcc aagcttgggc tgcaggtcaa tcccattgct tttgaagcag 150 
ctcaacattg atctcttt IBA 



<2X0> n 

<2ll> m 

<S1S> DNA 

<S13> Artificial Sequence 



<S2D> 

<SS3> Synthetic 

ggtaccctga gctc IM 



<S10> 50 

<2ll> HI 

<^l^> PNA 

<513> Artificial Sequence 
<5ED> 

<SE3> Synthetic 

<MDD> SO 

gaattcatat ttcctcctgc agggtttaaa cttgccgtgg c Ml 



<B10> Ell 

<E11> 21 

<S12> ^NA 

<B13> Artificial Sequence 
<2ED> 

<2E3> Synthetic 

<H00> 21 

cggcccacta gtcaccggtg t El 



<210> E2 

<211> 27 

<212> DNA 

<213> Artificial Sequence 
<22Q> 

<223> Synthetic 

<M00> 22 

gcgcacgctg cgcacgctgc gcacgct 27 



<21D> 23 

<211> 22 

<212> 1>NA 

<213> Artificial Sequence 
<22D> 

<223> Synthetic 

<MDD> 23 

acaccggtgt gatcatgggc eg 22 
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<B10> 
<211> 

<BX2> ]>NA 

<SI3> Artificial Sequence 
<SBD> 

<H23> Synthetic 
<HOD> SM 

tggccaccgc ttaattaagg cgcgccatgc cccctgca'g^ l^ccccgggga tcctctagag bD 
tcgacctgc 

<B10> H5 
<511> IMM 
<aiS> PNA 

<213> Artificial Sequence 
<220> 

<223> Synthetic 
<MDO> 25 

cggccggcct agctagccac ggtggccaga tccactaggg gcaagcggcc gcttaattaa bQ 
atttaaatgt ttaaactagg aaatccaagc ttgggctgca ggtcaatccc attgcttttg 120 
aagcagctca acattgatct cttt IHH 



<2Xa> 2b 

<211> m 

<212> DMA 

<213> Artificial 



Sequence 



<EED> 

<223> Synthetic 
<400> 2b 

ggtaccctga gctc m 



<210> 27 

<211> m 

<2I.2> DNA 

<213> Artificial 

<22D> 

<223> Synthetic 



Sequence 



<MDO> 27 

gaattcatat ttcctcctgc agggtttaaa cttgccgtgg c 



